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Abs tract 



After presenting a general linear model as a framework for 
discussion, the present paper reviews five methodology errors that 
occur in educational research: (a) the use of stapnirisB methods; (b) 
the failure to consider in result interpretation the context 
specificity of analytic weights (e.g., regression beta weights, 
factor pattern coefficients, discriminant function coefficients, 
canonical function coefficients) that are part of all parametric 
quantitative analyses; (c) the failure to interpret both weights 
and structure coefficients as part of result interpretation; (d) 
the failure to recognize that reliability is a characteristic of 
scores, and not of tests; and (e) the incorrect interpretation of 
statistical significance and the related failure to report and 
interpret the effect sizes present in all quantitative analyses. In 
several cases small heuristic discriminant analysis data sets are 
presented to make more concrete and accessible the discussion of 
each of these five methodology errors. 
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A well-known popular cliche holds that a chain is only as 
strong as its weakest link. So, too, a research study will be at 
least partially compromised by whatever is the weakest link in the 
sequence of activities that cumulate in a completed investigation. 
Too often the weakest link in contemporary quantitative educational 
research involves the methodologies of statistical analysis. 

There is no question that educational research, whatever its 
methodological and other limits, has influenced and informed 
educational practice (cf. Gage, 1985; Travers, 1983). But there 
seems to be some consensus that "too much of what we see in print 
is seriously flawed” as regards research methods, and that "much of 
the work in print ought not to be there" (Tuckman, 1990, p. 22) . 
Gall, Borg and Gall (1996) concurred, noting that "the quality of 
published studies in education and related disciplines is, 
unfortunately, not high" (p. 151) . 

Empirical studies of published research involving methodology 
experts as judges corroborate these holistic impressions. For 
example. Hall, Ward and Comer (1988) and Ward, Hall and Schramm 
(1975) found that over 40% and over 60%, respectively, of published 
research was judged by methods experts as being seriously or 
completely flawed. Wandt (1967) and Vockell and Asher (1974) 
reported similar results from their empirical studies of the 
quality of published research. Dissertations, too, have been 
examined, and too often have been found methodologically wanting 
(cf . Thompson, 1988a, 1994a) . 

Of course, it must be acknowledged that even a 
methodologically flawed study may still contribute something to our 




4 



Pantheon of Faux Pas -4- 
Introduction 

understanding of educational phenomena. As Glass (1979) noted, 
"Our research literature in education is not of the highest 
quality, but I suspect that it is good enough on most topics" (p. 
12 ) . 

But the problem with methodologically flawed studies is that 
these methodological flaws are entirely gratuitous. There is no 
upside to conducting incorrect statistical analyses. Usually a more 
thoughtful analysis is not appreciably more demanding in time or 
expertise than is a compromised choice. Rather, incorrect analyses 
arise from doctoral methodology instruction that teaches research 
methods as series of rotely-followed routines, as against 
thoughtful elements of a reflective enterprise; from doctoral 
curricula that seemingly have less and less room for quantitative 
statistics and measurement content, even while our knowledge base 
in these areas is burgeoning (Aiken, West, Sechrest, Reno, with 
Roediger, Scarr, Kazdin & Sherman, 1990; Pedhazur & Schmelkin, 
1991, pp. 2-3); and, in some cases, from an unfortunate atavistic 
impulse to somehow escape responsibility for analytic decisions by 
justifying choices, sans rationale, solely on the basis that the 
choices are common or traditional. 

Purpose of the Paper 

The purpose of the present paper is to review five methodology 
errors that occur in educational research: (a) the use of stepw^ise 
methods; (b) the failure to consider in result interpretation the 
context specificity of analytic weights (e.g., regression beta 
weights, factor pattern coefficients, discriminant function 
coefficients, canonical function coefficients) that are part of all 
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parametric quantitative analyses; (c) the failure to interpret both 
weights and structure coefficients as part of result 
interpretation; (d) the failure to recognize that reliability is a 
characteristic of scores, and not of tests; and (e) the incorrect 
interpretation of statistical significance and the related failure 
to report and interpret the effect sizes present in all 
quantitative analyses. These comments are not new to the 
literature, or even to my own writing. But the field has seemingly 
remained somewhat recalcitrant in reflecting evolution as regards 
these methodological issues. 

The paper presents a conceptual overview of each concern. In 
several cases small heuristic data sets are presented to make more 
concrete and accessible the discussion of each of these five 
methodology errors. Because, as will be shown, all parametric 
methods are part of one general linear model (GLM) family, 
methodology dynamics illustrated for one heuristic example 
generalize to other related cases. In the present paper, 
discriminant analysis examples are consistently (but arbitrarily) 
employed as heuristics. Nevertheless, the illustrations necessarily 
generalize to other analyses within the GLM family. 

Delimitation 

Of course, methodological errors other than these five might 
have been cited. For example, empirical studies (Emmons, Stallings 
& Layne, 1990) show that, "In the last 20 years, the use of 
multivariate statistics has become commonplace" (Grimm & Yarnold, 
1995, p. vii) , probably for very good reasons (Fish, 1988; 
Thompson, 1984, 1994e) . Many such studies employ MANOVA (all to the 
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good) , but an unfortunate number of these studies then use ANOVA 
methods post hoc to explore detected multivariate effects (all to 
the bad) (Borgen & Seling, 1978) . As I have noted elsewhere. 

The multivariate analysis evaluates multivariate 
synthetic variables, while the univariate analysis 
only considers univariate latent variables. Thus, 
univariatB post hoc tests do not inform the 
researcher about the differences in the multivariate 
latent variables actually analyzed in the 
multivariate analysis... It is illogical to first 
declare interest in a multivariate omnibus system of 
variables, and to then explore detected effects in 
this multivariate world by conducting non- 
multivariate tests! (Thompson, 1994e, p. 14, 
emphasis in original) 

Similarly, all too often researchers erroneously interpret the 
eigenvalues in factor analysis as reflecting the variance contained 
in the individual factors after rotation (Thompson & Daniel, 
1996a) . Or the discarding of variance in order to conduct ANOVA 
(cf. Thompson, 1985) or incorrect use of ANCOVA (Thompson, 1992b) 
might have been discussed. However, space precludes discussion here 
of all possible common methodology errors; the present discussion 
necessarily must be delimited in some manner. 

Premise Regarding Movement in Fields 

In considering these five methodology errors, it may be 
important for each of us to remember that, over the course of 
careers, fields, including the methodology-related fields, do move. 
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Invariably, those of us in the late stages of our careers will 
confront the realization that some methodology choices in our own 
work, published decades earlier, no longer reflect standards of 
present best practice, or might even now be deemed fully 
inappropriate. Responsible scholars must remain open, and be 
willing to engage in continual reflection as to whether our own 
personal analytic traditions remain viable. 

Some have suggested that resistance to adopting revised 
methodological practice may in some cases be an artifact of denial, 
cognitive dissonance, and other classical psychological dynamics 
(Thompson, in press-d) . For example, Schmidt and Hunter (1997) 
noted that "changing the beliefs and practices of a lifetime... 
naturally... provokes resistance" (Schmidt & Hunter, 1997, p. 49). 
Similarly, Rozeboom (1960) observed that "the perceptual defenses 
of psychologists are particularly efficient when dealing with 
matters of methodology, and so the statistical folkways of a more 
primitive past continue to dominate the local scene" (p. 417) . 

Recognizing the reality that fields move, and that to be fair 
works must be evaluated primarily against the methodological 
standards contemporary at the time of a given report, may 
facilitate helpful change. Prior to advocating selected changes, 
however, the general linear model (GLM) will be briefly described 
so as to provide a unifying conceptual framework for the remaining 
discussion. Structural equation modeling (SEM) will be presented 
as the most general case of the general linear model (GLM) . 

Conceptual Framework; SEM as the General Linear Model (GLM) 

In one of his innumerable seminal contributions, the late 
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Jacob "Jack" Cohen (1968) demonstrated that multiple regression 
subsumes all the univariate parametric methods as special cases, 
and thus provides a univariate general linear model that can be 
employed in all univariate analyses. Ten years later, in an equally 
important article Knapp (1978) presented the mathematical theory 
showing that canonical correlation analysis subsumes all the 
parametric analyses, both univariate and multivariate, as special 
cases. More concrete demonstrations of these relationships have 
also been offered (Fan, 1996; Thompson, 1984, 1991, in press-a) . 
Both the Cohen (1968) and the Knapp (1978) articles were cited 
within a compilation of the most noteworthy methodology articles 
published during the last 50 years (Thompson & Daniel, 1996b) . 

However, structural equation modeling (SEM) represents an even 
bigger conceptual tent subsuming more restrictive methods (Bagozzi, 
1981) . Instructive illustrations of these relationships have been 
offered by Fan (1997) . Prior to extracting the conceptual 
implications of the realization that a general linear model 
underlies all parametric analyses, a concrete demonstration that 
SEM is a general linear model subsuming canonical correlation 
analysis (CCA) (and its multivariate and univariate special cases) 
may be useful. 

Heuristic Illustration that SEM Subsumes CCA 

The illustration that SEM is a general linear model subsuming 
canonical correlation analysis (and its multivariate and univariate 
special cases) employs scores on seven variables (i.e., two in one 
set, and three in the other set) from the 301 cases in the 
Holzinger and Swineford (1939, pp. 81-91) data. These scores on 
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ability batteries have classically been used as examples in both 
popular textbooks (Gorsuch, 1983, passim) and computer program 
manuals (Joreskog & Sorbom, 1989, pp. 97-104), and thus are 
familiar to many readers. 

Table 1 presents the bivariate correlation matrix for these 
data. As in all parametric analyses, a correlation or covariance 
matrix is the basis for all analyses; this matrix is partitioned 
into quadrants (see Table 1) honoring the variables' membership in 
criterion or predictor sets, and is then subjected to a principal 
components analysis (Thompson, 1984, in press-a) . 

INSERT TABLE 1 ABOUT HERE. 

Appendix A presents the SPSS/LISREL computer program used to 
analyze the data. Table 2 presents the SPSS canonical correlation 
analysis of these same data. 



INSERT TABLE 2 ABOUT HERE. 



Table 3 presents the relevant portions of the LISREL analysis 
of the canonical correlation model for these data. The LISREL 
coefficients for the "gamma” matrix exactly match (within rounding 
error) the SPSS canonical function coefficients presented in Table 
2. The only exception is that all the signs for the SEM second 
canonical function coefficients must be "reflected." "Reflecting" 
a function (changing all the signs on a given function, factor, or 
equation) is always permissible, because the scaling of 
psychological constructs is arbitrary. Thus, the SEM and the 
canonical analysis derived the same results. Since SEM can be 
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employed to test a CCA model, SEM is an even more general case of 
the general linear model, quod erat demonstrandum. 

INSERT TABLE 3 ABOUT HERE. 



Heuristic Implications 

There are a number of implications that can be drawn from the 
realization that a general linear model subsumes other methods as 
special cases. Specifically, all classical parametric methods are 
least squares procedures that implicitly or explicitly (a) use 
least squares weights (e.g. , regression beta weights, standardized 
canonical function coefficients) to optimize explained variance and 
minimize model error variance, (b) focus on latent synthetic 

A. 

variables (e.g. , the regression Y variable) created by applying the 
weights (e.g. , beta weights) to scores on measured/ observed 
variables (e.g. , regression predictor variables) , and (c) yield 
variance-accounted-for effect sizes analogous to r^ (e.g., R^, eta^, 
omega^) . Thus, all classical analytic methods are correlational 
(Knapp, 1978; Thompson, 1988a). 

Designs may be experimental or correlational, but all analyses 
are correlational. Thus, an effect size analogous to r^ can be 
computed in any parametric analysis (see Snyder and Lawson (1993) , 
or Kirk (1996) ) . 

The fact that all classical parametric methods use weights to 
then compute synthetic/ latent variables by applying the weights to 
the measured/observed variables is obscured by the fact that most 
computer packages do not print the least squares weights that are 
actually invoked in ANOVA, for example, or when t-tests are 
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conducted. Thus, some researchers unconsciously presxame that such 
methods do not invoke optimal weighting systems. 

The fact that all classical parametric methods use weights to 
then compute synthetic/ latent variables by applying the weights to 
the measured/ observed variables is also obscured by the inherently 
confusing language of statistics. As I have noted elsewhere, the 
weights in different analyses 

...are all analogous, but are given different names 
in different analyses (e.g., beta weights in 
regression, pattern coefficients in factor analysis, 
discriminant function coefficients in discriminant 
analysis, and canonical function coefficients in 
canonical correlation analysis) , mainly to obfuscate 
the commonalities of [all] parametric methods, and 
to confuse graduate students. (Thompson, 1992a, pp. 
906-907) 

If all standardized weights across analytic methods were called by 
the same name (e.g., beta weights), then researchers might 
(correctly) conclude that all analyses are part of the same general 
linear model. 

Indeed, both the weight systems (e.g., regression equation, 

A 

factor) and the synthetic variables (e.g., the regression Y 
variable) are also arbitrarily given different names across the 
analyses, again mainly so as to confuse the graduate students. 
Table 4 summarizes some of the elements of the very effective 
conspiracy. 
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INSERT TABLE 4 ABOUT HERE. 



The present paper will employ this general linear model as a 
unifying conceptual framework for some of the arguments made 
herein. However, prior to presenting these views, a brief 
digression is required. 

Predictive Discriminant Analysis (PDA) as a Hybrid GLM Offshoot 

In the seminal work on discriminant analysis, Huberty (1994; 
see also Huberty and Barton (1989) and Huberty and Wisenbaker 
(1992)) thoughtfully distinguished two major applications: 
descriptive discriminant analysis (DDA) and predictive discriminant 
analysis (PDA) . Put simply, DDA describes the differences on 
intervally-scaled "response” variables associated with a nominally- 
scaled variable, membership in different groups. PDA, on the other 
hand, uses intervally-scaled "response" variables to predict 
membership in different groups. Thus, the purpose of the analysis 
distinguishes the two methods (and these purposes subsequently 
determine which aspects of the results are relevant or irrelevant) . 

The drawing of a distinction between DDA and PDA is not mere 
statistical nit-picking. Instead, the relevant aspects of DDA and 
PDA results are completely different. For example, in PDA the "hit 
rate" (and which response variables most contribute to the hit 
rate) is the sina qua non of the analysis, while the weights are 
generally irrelevant as regards result interpretation. In DDA, on 
the other hand, the weights and the "structure" of the 
synthetic/ latent variable scores are very important to 
interpretation, but the concept of hit rate becomes irrelevant. 

er|c 
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The number of systems of weights (i.e., "functions," or 
"rules") also differs across DDA and PDA. In DDA, the number of 
linear discriminant functions (LDFs) is the number of groups minus 
one, or the number of response variables, whichever is smaller. In 
PDA, the number of linear classification functions (LCFs) is the 
number of groups. For example, with two groups and three response 
variables, in DDA there would be one LDF (and an associated set of 
scores on the synthetic variable, the discriminant scores) . In the 
same case, in PDA there would be two LDFs (and associated sets of 
scores on the synthetic variables, the classification scores) . 

PDA is a hybrid offshoot of the general linear model, while 
DDA resides fully within the GLM nuclear family. Thus, the 
conclusions reached here based on GLM concepts may not apply to the 
PDA case. 

When More Variables Can Hurt Study Effects 

One powerful demonstration of PDA versus DDA dynamics involves 
a paradox. In any GLM analysis, more variables (e.g., more 
regression predictors) always lead to effect sizes (e.g., R^) that 
are equal to or greater than the effects associated with fewer 
variables. However, in PDA, more response variables can actually 
hurt the PDA hit rate. 

The Table 5 data, drawn from the Holzinger and Swineford 
(1939) data described previously, can be analyzed to illustrate 
these dynamics. The Appendix B SPSS program conducts the relevant 
analyses. 



INSERT TABLE 5 ABOUT HERE. 
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Table 6 presents the hit rates derived using three response 
variables as predictors using both LDF and LCF scores; these hit 
rates are both 66.4% ([40 + 31] / 107). [Normally only LCFs are 
used for classification purposes, even though SPSS incorrectly uses 
LDF scores for this purposes (Huberty & Lowman, 1997)]. Table 6 
also presents the hit rates derived using four response variables 
as predictors using both LDF and LCF scores; these hit rates are 
both 63.6% ([38 + 30] / 107). Figure 1 presents the corresponding 
results in graphic form. 

INSERT TABLE 6 AND FIGURE 1 ABOUT HERE. 

Indeed, the hit rate differences with the use of three versus 
four response variables is even greater than the apparent 
difference of 71 versus 68 people, respectively, being correctly 
classified. In fact, as noted in Table 7, 9 persons were classified 
differently across the analyses using three versus four response 
variables, even though the net impact of using more predictors was 
a net loss in predictive accuracy of three hits. [If the same data 
were treated as reflecting a DDA case, the Wilks lambda effect size 
would be the same or better (i.e., a smaller lambda value) for four 
(0.8050684) as against three (0.8094909) response variables, as is 
always true in the GLM case.] 

INSERT TABLE 7 ABOUT HERE. 

Elsewhere I (Thompson, 1995b) have explained some of these 
counterintuitive dynamics by portraying a hypothetical set of 
results involving five response variables. Presume there were three 
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'•fence-riders, ” that is, cases very near the classification 
boundaries (arbitrarily cases #4, #11, and #51). Let's say with 
five predictor variables our initial lambda is .50, and let's say 
we add an additional, sixth response variable as a PDA predictor. 

Clearly, having more predictive information always help us 
better explain data dynamics, or at least can't take away what we 
already know. This is reflected by the fact that the Wilks lambda 
value will always stay the same or get better (i.e., smaller) as we 
add predictor variables. 

But this occurs only on the average, as 
reflected in on-the-average statistics such as 
lambda. While relative explanatory power will 
remain the same or improve on the average, at the 
case level each and every single case will not 
necessarily move toward its actual group's location 
when the additional sixth predictor variable is 
used. For example, let's say that all cases' 
positions except cases #4, #11, #51 and #43 remain 
fixed in essentially their initial locations and 
that group territorial boundaries also remain 
roughly unchanged. 

If because the sixth predictor was especially 
useful in locating case #43, case #43 might move 
very far toward but not over the boundary that would 
have yielded a correct classification. Lambda would 
reflect this change by getting better (i.e., 
smaller), such as changing from .50 to perhaps .45. 

O 
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Cases #4, #11, and #51 might move slightly away from 
their actual group, because although the sixth 
predictor will either not change explanatory power 
or will provide more information on the average, it 
is still possible that the sixth predictor may 
provide misinformation about these three particular 
cases, resulting in their moving across their actual 
group boundary and becoming misclassif ied. This 
small movement will, of course, be reflected in 
lambda, which will correspondingly get only slightly 
worse (i.e., bigger), such as moving from .45 to 
.46. Yet even though on the average locations have 
gotten more accurate and lambda has consequently 
improved from the original .50 to the final .46, the 
number of cases correctly classified when using all 
six predictors will have gotten worse by a net 
classification-accuracy change of minus three cases. 
(Thompson, 1995b, p. 345, emphasis in original) 

Error #1; Using Stepwise Methods 
Huberty (1994) has noted that, "It is quite common to find the 
use of 'stepwise analyses' reported in empirically based journal 
articles” (p. 261). Huebner (1991, 1992) and Jorgenson, Jorgenson, 
Gillis and McCall (1993) are a few examples from among the many 
egregious reports of stepwise analyses. 

Stepwise methods continue to be used, notwithstanding scathing 
indictments of many of these applications (cf. Huberty, 1989; 
Snyder, 1991) . My own feelings are intimated by the title of one of 
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my editorials, viz. "Why won't stepwise methods die?" (Thompson, 
1989) . 

Three major problems with stepwise can be noted, and will be 
briefly summarized here. A more complete treatment is available in 
Thompson (1995c) . 

The consequences of these three problems are quite serious. As 
Cliff (1987, p. 185) noted, "most computer programs for [stepwise] 
multiple regression are positively satanic in their temptations 
toward Type I errors." He also suggested that, "a large proportion 
of the published results using this method probably present 
conclusions that are not supported by the data" (pp. 120-121) . 
Wrong Degrees of Freedom 

First, most computer packages (and thus most researchers) use 
the wrong degrees of freedom in their statistical significance 
tests for stepwise methods, thus systematically always inflating 
the likelihood of obtaining statistically significant results. 
Degrees of freedom are the "coins" we pay to investigate the 
dynamics within our data. The statistical significance tests take 
into account both the number of coins we've chosen to spend and the 
number we have chosen to reserve. 

The most rigorous tests occur when we spend few degrees of 
freedom and reserve many. Conversely, at the extreme, all models 
with no degrees of freedom reserved (i.e., degrees of freedom error 
=0) always fit the data perfectly. For example, the bivariate r^ 
with n=2 inherently is always 1.0, as long as both X and Y are 
variables. Similarly, the multiple regression with two predictors 
variables and n=3 inherently must always be 1.0. 
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The computer packages conventionally charge degrees of freedom 
for the numerator (synonymously also called "model,” "between,” 
"regression,” and "explained,” to confuse the graduate students) 
that are a function of the number of response variables "entered” 
in the analysis at a given step. The remaining degrees of freedom 
(synonymously called "denominator,” "residual,” "error," "within,” 
and "unexplained”) are inversely related to the number of response 
variables "entered” in a given step. 

Table 8 illustrates these dynamics for a study involving 2 
steps of stepwise analysis, with k=3 groups and n=120 people. Table 
8 compares the results for two steps of analysis using the degrees 
of freedom calculations employed by SPSS and other computer 
packages, labelled "Incorrect,” with the same calculations 
employing the correct degrees of freedom. 

INSERT TABLE 8 ABOUT HERE. 



The differences in the analyses revolves around what "entered” 
means. The computer packages define "entered” or "used” as actually 
entered into the prediction equation. Thus, in step one the 
packages consider that only one predictor has been entered, while 
in step two the packages consider that two response variables have 
been entered. 

However, in this example each and every one of the 50 response 
variables was "used" at each and every one of the three steps, to 
decide which variable to enter at each step. The 49 or 48 
unselected response variables may not have been retained in the 
analysis, but each one was examined, and played with, and actually 
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tasted, prior to the leftovers then being returned to the cafeteria 
display case. 

This system of determining the degrees of freedom bill is 
analogous to only charging John Belushi in the movie Animal House 
for the food on his cafeteria tray, and charging nothing for what 
he has tasted and discarded. Clearly, this statistical package 
system of coinage is wrong. [Charging only for variables actually 
entered at each step would be appropriate, for example, if these 
response variables were randomly selected without first tasting 
each and every response variable.] 

It is instructive to see how using the wrong degrees of 
freedom in the numerator of the statistical significance testing 
calculations, and the wrong denominator df in the calculations, 
both bias the tests in favor of getting statistical significance. 
Table 8 illustrates how dramatic the effect of using the wrong 
degrees of freedom can be. 

After one step, the computer calculates that = 15.29841, 
with an associated probability of .0000012; the correct F(iooi 36 ) is 
0.16751, with an associated probability of 1.00000. After the 
second step, the computer calculates that F( 4232 ) = 13.64322, with an 
associated probability of .0000945; the correct F(iooi 36 ) is 0.31991, 
with an associated probability of 1.00000. Obviously, the example 
illustrates that the correct and incorrect results can be night-vs- 
day different! 

Three factors determine exactly how egregious ly the use of the 
wrong degrees of freedom distorts the stepwise results. The 
distortions are increasingly serious as (a) sample size is smaller. 
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(b) the number of steps is larger, and (c) the number of response 
variables available to be selected is larger. 

Nonreplicabilitv of Results 

Second, stepwise methods tend to yield results that are 
sample-specific and do not generalize well to future studies. This 
is because stepwise requires a linear sequence of decisions, each 
of which is contingent upon all the previous decisions in the 
sequence. This is very much like walking through a maze — an 
incorrect decision at any point will lead to a cascade of 
subsequent decisions that each may themselves be wrong. 

Stepwise considers all differences of any magnitudes between 
variance explained by the response variables to be exact and true. 
Since there are usually numerous combinations of the response 
variables, and credit for variance explained for each partition of 
the variables may be influenced by sampling error, any small amount 
of sampling error anywhere in a single response variable can lead 
to disastrously erroneous choices in the linear sequence of 
stepwise selection decisions. 

Stepwise Does NOT Identify the Best Variable Set of a Given Size 

Third, stepwise methods do not correctly identify the best set 
of predictors of a given response variable set size, k. For 
example, if one has 30 response variables, and does three steps of 
analysis, it is possible that the best predictor set of size k=3 
will include none of the three variables selected after three steps 
of stepwise analysis of the same data, and that the three stepwise 
variables would also yield a lower effect size. 

This may seem counter-intuitive, but upon reflection, it 
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should be easy to see that in fact stepwise analysis does not seek 
to identify the best variable set of a certain size. Stepwise 
simply does not ask the question, "What is the best predictor set 
of a given size?” This question requires simultaneously considering 
all the combinations of the variables that are possible for a given 
set size. Stepwise analysis never simultaneously considers all the 
combinations of the predictor variables. Rather, at each step 
stepwise analysis takes the previously entered variables as a 
given, and then asks which one change in the predictor set will 
most improve the prediction. 

Picking the best new variable in a sequence of selections is 
not the same as picking the best variable set of a given size. As 
Thompson (1995c) explained: 

Suppose one was picking a basketball team consisting 
of five players. The stepwise selection strategy 
picks the best potential player first, then the 
second best player in the context of the 
characteristics of the previously-selected first 
player, and so forth. 

An alternative strategy is an all-possible- 
subsets approach which asks, "which five potential 
players play together best as a team?". This team 
might conceivably contain exactly zero of the five 
players selected through the stepwise approach. 
Furthermore, this "best team" might be able to stomp 
the "stepwise team" by a considerable margin, 
because teams consisting of players of lesser 
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abilities may still play together better as a team 
than players selected through a linear sequence of 
stepwise decisions. (pp. 528, 530, emphasis in 
original) 

The Table 9 data provide a powerful heuristic. Table 10 
presents an abridged printout for these data involving two steps of 
stepwise DDA, conducted using the Appendix C SPSS program. In this 
analysis the stepwise algorithm selects response variables XI and 
X2, and the lambda value is .6553991 (F (4 232)=13 . 64322 ) . 

INSERT TABLES 9 AND 10 ABOUT HERE. 



Compare the Table 10 results with those in Table 11. Table 11 
presents the DDA results for all six possible combinations of the 
four response variables considered two at a time. Note that the 
best set of two variables (i.e., smallest lambda) involves response 
variables X3 and X4 (X = .6272538, F(4232)=15 . 23292 ) . The best 
variable set of size two contained neither of the two variables 
selected by the stepwise analysis!!!!! 



INSERT TABLE 11 ABOUT HERE. 



Error #2; Ignoring the Context Specificity of GLM Weights 
As noted previously, all univariate and multivariate methods 
apply weights to the measured variables to derive scores on the 
latent or synthetic variables that are actually the focus of all 
analyses. Consequently, if (and only if) noteworthy effects (e.g., 
R^, Rc^) are detected, it then becomes reasonable to consult the 
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weights as part of the process of determining which response 

variables contributed to the detected effect. Indeed, some 

researchers have even taken the view that these weights (e.g., beta 

weights, standardized discriminant function coefficients) should be 

the sole basis for evaluating the importance of response variables 

(Harris, 1989) . 

Unfortunately, over interpretation of GLM weights is a serious 
threat. The weights can be greatly influenced by which variables 
are included or are excluded from a given analysis. Furthermore, 
Cliff (1987, pp. 177-178) noted that weights for a given set of 
variables may vary widely across samples, and yet consistently 
still yield the same effect sizes (i.e., be what he called 
statistically "sensitive") . Clearly weights are not the sole story 
in interpretation. 

Any interpretations of weights must be considered context- 
specific. Any change in the variables in the model can radically 
alter all of the weights. Too few researchers appreciate the 
potential magnitudes of these impacts. 

The Table 12 data illustrate these dynamics. The analysis 
contrasts using DDA models with either three response variables 
(i.e., XI, X2, and X3) or four response variables (i.e., XI, X2, 
X3, and X4). The example can be framed as either adding one 
response variable to an analysis involving three response 
variables, or deleting one response variable from an analysis 
involving four. This DDA example involves variance-covariance 
matrices for each of three groups that are exactly equal (called 
"homogeneity") , so the results are not confounded by failure to 
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meet one of the assumptions of the analysis. 

INSERT TABLE 12 ABOUT HERE. 

Table 13 presents an excerpt from an SPSS analysis of the 
Table 12 data conducted using the Appendix D computer program. Note 
the dramatic changes in the DDA standardized function coefficients. 
For example, with three response variables the first response 
variable, XI, had standardized function coefficients of 1.50086 and 
-.01817 on the two DDA functions. With four response variables XI 
had standardized function coefficients of -.47343 and 1.22249 on 
the two DDA functions. Thus, the coefficients were quite variable 
in both magnitude and sign. 

INSERT TABLE 13 ABOUT HERE. 



These fluctuations are not problematic, if (and only if) the 
researcher has selected exactly the right model (i.e., has not made 
what statisticians call a model specification error) . But as 
Pedhazur (1982) has noted, "The rub, however, is that the true 
model is seldom, if ever, known" (p. 229). And as Duncan (1975) has 
noted, "Indeed it would require no elaborate sophistry to show that 
we will never have the 'right' model in any absolute sense" (p. 
101 ) . 

In other words, as a practical matter, the context-specificity 
of weights is always problematic, and the weights consequently must 
be interpreted cautiously. Some researchers acknowledge the 
vulnerability of the weights to sampling error influences (i.e., 
the so-called "bouncing beta" problem) , but a more obvious concern 
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is the context-specificity of the weights in the real-world context 

of full or partial model misspecif ication. 

Error #3; Failing to Interpret 

Both Weights and Structure Coefficients 

A response variable given a standardized weight of zero is 
being obliterated by the multiplicative weighting process, 
indicating either that (a) the variable has zero capacity to 
explain relationships among the variables or that (b) the variable 
has some explanatory capacity, but one or more other variables 
yield the same explanatory information and are arbitrarily (not 
wrongly, just arbitrarily) receiving all the credit for the 
variable's predictive power. Because a response variable may be 
assigned a standardized multiplicative weight of zero when (b) the 
variable has some explanatory capacity, but one or more other 
variables yield the same explanatory information and are 
arbitrarily (not wrongly, just arbitrarily) given all the credit 
for the variable's predictive power, it is essential to evaluate 
other coefficients in addition to standardized weights during 
interpretation, to determine the specific basis for the weighting. 

Just as it would be incorrect to evaluate predictor variables 
in a regression analysis only by consulting beta weights (Cooley & 
Lohnes, 1971, p. 55; Thompson & Borrello, 1985), in any GLM 
analysis it would be inappropriate to only consult standardized 
weights during result interpretation (Borgen & Seling, 1978, p. 
692; Kerlinger & Pedhazur, 1973, p. 344; Levine, 1977, p. 20; 
Meredith, 1964, p. 55, Thompson, 1997b). Yet, some researchers do 
exactly that (cf . Humphries-Wadsworth, 1998) . 
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Under most circumstances standardized weights are not 
correlation coefficients. Thus, some of the weights in the Table 11 
are less than -1 or are greater than +1. Structure coefficients, 
on the other hand, are always correlation coefficients, and reflect 
the linear relationship between scores on a given measured or 
observed variable with the scores on a given latent or synthetic 
variable. Thus, because synthetic variable are actually the focus 
of all parametric analyses, and because structure coefficients 
reveal the structure of these latent variables, the importance of 
structure coefficients seems obvious. 

Three possible cases can be delineated. The three 
illustrations demonstrate that jointly considering both 
standardized weights and structure coefficients indicates to the 
researcher which case is present in a given analysis. Appendix E 
presents the SPSS computer program used to analyze the three 
heuristic data sets. 

Case #1: Function and Structure Coefficients are Equal 

In the special GLM case where measured variables are 
uncorrelated, the standardized weights in this case (and in this 
case only) are correlation coefficients. For example, in 
regression, if the predictor variables are uncorrelated, each 
predictor variable's beta weight equals that variable's product- 
moment correlation with the criterion variable. In discriminant 
analysis, the same principle applies if the ''pooled” correlation 
matrix of the response variables indicates that the response 
variables are uncorrelated. 
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Table 14 presents a hypothetical DDA data set illustrating 
this case for a k=3 group problem involving scores of n=30 people 
on each of p=3 response variables. As indicated by the Table 15 
excerpt from the SPSS output for these data, in this special case 
the standardized function coefficients exactly equal the respectiye 
structure coefficients of the response variables. 



INSERT TABLES 14 AND 15 ABOUT HERE. 



Case #2: Measured Variables with Near-zero Weights Still Important 

As noted previously, measured variables may be assigned 
multiplicative weights of zero if the measured variable contains 
useful variance, but that variance is also present in some 
combination of the other measured variables. The researcher 
interpreting these results, especially if only standardized weights 
are interpreted, might erroneously conclude that such a response 
variable with a near-zero weight had essentially no utility in 
generating the observed effect. Instead, the result merely 
indicates that this variable is arbitrarily being denied credit for 
its potential contributions. 

Table 16 presents a relevant heuristic DDA data set for this 
case involving k=3 groups and p=3 response variables. Table 17 
presents an excerpt from the related SPSS analysis of the tabled 
data. 



INSERT TABLES 16 AND 17 ABOUT HERE. 



In this example, the standardized function coefficient on 
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Function I for X3 was -.05507, while on the same function the other 

two response variables had standardized function coefficients of 

roughly +.95. Yet the squared structure coefficient (rg^ = .81431^ 

= 66.3%) for X3 on the function indicates that X3 had more than 

twice the explanatory power as variables XI (rg^ = .54141^ = 29.3%) 

and X2 (rg^ = .56453^ = 31.9%) . Clearly, consulting only the function 

coefficients for this example would have resulted in a serious 

misinterpretation of results. 

Case #3; "Suppressor” Effects 

The previous case makes clear that a measured variable 
assigned a zero or near-zero weight may nevertheless be an 
important variable, as reflected in the variable having a large 
non-zero structure coefficient. However, although it may seem 
counter-intuitive, a measured/observed variable may also have a 
zero or near-zero structure coefficient, and still be very 
important in defining a detected effect, as reflected in the 
variable having a non-zero standardized weight. [That is, only 
measured variables with both near-zero weights and near-zero 
structure coefficients are useless in defining a given detected 
effect. ] 

Such a variable is classically termed a "suppressor" variable. 
However, although the name may feel pejorative, a "suppressor" 
variable actually increases the effect size, and so suppression is 
a good (and not a bad) thing. As defined by Pedhazur (1982, p. 
104) , in the related regression case, "A suppressor variable is a 
variable that has a zero, or close to zero, correlation with the 
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criterion but is correlated with one or more than one of the 

predictor variables.” Henard (1998) provides a nice overview of 

suppressor effects. 

Suppressor effects are quite difficult to explain in an 
intuitive manner. But Horst (1966) gave an example that is 
relatively accessible. He described the multiple regression 
prediction of pilot training success during World War II using 
mechanical, numerical, and spatial ability scores, each measured 
with paper and pencil tests. The verbal scores had very low 
correlations with the dependent variable, but had larger 
correlations with the other two predictors, since they were all 
measured with paper and pencil tests, i.e., measurement artifacts 
inflate correlations among traits measured with similar methods. As 
Horst (1966, p. 355) noted, "Some verbal ability was necessary in 
order to understand the instructions and the items used to measure 
the other three abilities.” 

Including verbal ability scores in the regression equation in 
this example actually served to remove the contaminating influence 
of one predictor from the other predictors, which effectively 
increased the value from what it would have been if only 
mechanical, numerical and spatial abilities had been used as 
predictors. The verbal ability variable had negative beta weights 
in the equation. As Horst (1966, p. 355) noted, "To include the 
verbal score with a negative weight served to suppress or subtract 
irrelevant ability, and to discount the scores [on the other 
predictors] of those who did well on the test simply because of 
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their verbal ability rather than because of abilities required for 

success in pilot training.” The fact that a measured variable 

unrelated to a measured criterion variable can still make important 

contributions in an analysis itself makes the very important point 

that the latent or synthetic variables analyzed in all parametric 

methods are always more than the sum of their constituent parts. 

Table 18 presents a relevant heuristic DDA data set for this 
case involving k=3 groups and p=3 response variables. Table 19 
presents an excerpt from the related SPSS analysis of the tabled 
data. As reported in Table 19, on Function I DDA response variable 
X3 had a near-zero structure coefficient (rg = -.03464) , but a large 
non-zero standardized function coefficient (i.e., -1.58393). 

Indeed, on this function X3 had the largest absolute standardized 
function coefficient, since XI and X2 had standardized function 
coefficients of +1.22956 and +1.21174, respectively. 

INSERT TABLES 18 AND 19 ABOUT HERE. 



Error #4: Failing to Recognize that 
Reliability Is Not a Characteristic of Tests 

Nature of Score Reliability 

Misconceptions regarding the nature of reliability abound 
within the social sciences. For example, some researchers do not 
realize that, "Notwithstanding erroneous folkwisdom to the 
contrary, sometimes scores from shorter tests are more reliable 
than scores from longer tests" (Thompson, 1990, p. 586). In her 
important recent article, Vacha-Haase (1998a) cited the example of 
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the Bern Sex-Role Inventory, noting that, ”[i]n fact, the 20-item 

short-form of the Bern generally yields more reliable scores (rxx^ 

for the feminine scale ranging from .84 to .87) than does the 40- 

item long-form {r^ for the feminine scale ranging from .75 to 

.78)'* (pp. 9-10). 

Misconceptions regarding reliability flourish in part because 
[a]lthough most programs in sociobehavioral 
sciences, especially doctoral programs, require a 
modicum of exposure to statistics and research 
design, few seem to require the same where 
measurement is concerned. Thus, many students get 
the impression that no special competencies are 
necessary for the development and use of measures... 
(Pedhazur & Schmelkin, 1991, pp. 2-3) 

Empirical study of doctoral curricula confirms this impression 
(Aiken et al., 1990). 

The most fundamental problem is that too few researchers act 
on a conscious recognition that reliability is a characteristic of 
scores or the data in hand, and not of tests. Test booklets are not 
impregnated with reliability during the printing process. The WISC 
that yields reliable scores for some adults on a given occasion of 
measurement will not necessarily do so when the same test is 
administered to first-graders. 

Many researchers recognize these dynamics on some level, but 
unconscious paradigm influences constrain too many researchers from 
actively integrating this presumption into their actual analytic 
practice. The pernicious practice of saying, "the test is 
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reliable," creates a language that unconsciously predisposes 

researchers against acting on a conscious realization that tests 

themselves are not reliable (Thompson, 1994c) . Reinhardt (1996) 

provides an excellent relevant review of reliability coefficients, 

and the factors that impact score reliability. 

As Rowley (1976, p. 53, emphasis added) argued, "It needs to 
be established that an instrument itself is neither reliable nor 
unreliable.... A single instrument can produce scores which are 
reliable, and other scores which are unreliable." Similarly, 
Crocker and Algina (1986, p. 144, emphasis added) argued that, 
"...A test is not 'reliable' or 'unreliable.' Rather, reliability 
is a property of the scores on a test for a particular group of 
examinees . " 

In another widely respected text, Gronlund and Linn (1990, p. 
78, emphasis in original) noted. 

Reliability refers to the results obtained with an 



evaluation instrument 


and 


not 


to the 


instrument 


itself.... Thus, it 


is 


more 


appropriate to speak of 


the reliability of 


the "test 


scores" 


or of the 


"measurement" than 




of 


the 


"test" 


or the 



" instrument . " 

And Eason (1991, p. 84, emphasis added) argued that: 

Though some practitioners of the classical 
measurement paradigm [incorrectly] speak of 
reliability as a characteristic of tests, in fact 
reliability is a characteristic of data, albeit data 
generated on a given measure administered with a 
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given protocol to given subjects on given occasions. 

The subjects themselves impact the reliability of scores, and 

thus it becomes an oxymoron to speak of "the reliability of the 

test” without considering to whom the test was administered, or 

other facets of each individual measurement protocol. Reliability 

is driven by variance — typically, greater score variance leads to 

greater score reliability, and so more heterogeneous samples often 

lead to more variable scores, and thus to higher reliability. 

Therefore, the same measure, when administered to more heterogenous 

or to more homogeneous sets of subjects, will yield scores with 

differing reliability. As Dawis (1987, p. 486) observed, ”[b]ecause 

reliability is a function of sample as well as of instrument, it 

should be evaluated on a sample from the intended target 

population — an obvious but sometimes overlooked point.” 

Our shorthand ways of speaking (e.g., language saying "the 

test is reliable”) can itself cause confusion and lead to bad 

practice. As Pedhazur and Schmelkin (1991, p. 82, emphasis in 

original) observed, "Statements about the reliability of a measure 

are. . . inappropriate and potentially misleading.” These telegraphic 

ways of speaking are not inherently problematic, but they often 

later become so when we come unconsciously to ascribe literal truth 

to our shorthand, rather than recognizing that our jargon is merely 

telegraphic and is not literally true. As noted elsewhere: 

This is not just an issue of sloppy speaking — the 

problem is that sometimes we unconsciously come to 

think what we say or what we hear, so that sloppy 

speaking does sometimes lead to a more pernicious 
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outcome, sloppy thinking and sloppy practice. 
(Thompson, 1992c, p. 436) 

Implications for Practice 

These views suggest at least three implications for research 
practice. These practices are, unfortunately, not yet normative 
within the social sciences. 

Language Use . One fairly straightforward recommendation is 
that researchers should not use language saying that, "the test is 
reliable [or valid]," or that, "the reliability [or validity] of 
the test was .xx." Because on its face this language is inaccurate, 
and asserts untruth, it seems imprudent to use such language in 
scholarly discourse. The editorial policies of at least one journal 
commend better, correct practices; 

Based on these considerations, use of wording such 
as "the reliability of the test" or "the validity of 
the test" will not be considered acceptable in the 
journal. Instead, authors should use language such 
as, "the scores in our study had a classical theory 
test-retest reliability coefficient of X," or "based 
on generalizability theory analysis, the scores in 
our study had a phi coefficient of X." Use of 
technically correct language will hopefully 
reinforce better practice. (Thompson, 1994c, p. 841) 

Coefficient Reporting . Researchers also ought to routinely 
report the reliability coefficients for their own data. Many do not 
do so now, because they act under the pernicious misconception that 
tests are reliable, and are therefore invariant across 
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administrations . 

But it is sloppy practice to not calculate, report, and 
interpret the reliability of one's own scores for one's own data. 
As Pedhazur and Schmelkin (1991, p. 86, emphasis in original) 
argued : 

Researchers who bother at all to report reliability 
estimates for the instruments they use (many do not) 
frequently report only reliability estimates 
contained in the manuals of the instruments or 
estimates reported by other researchers. Such 
information may be useful for comparative purposes, 
but it is imperative to recognize that the relevant 
reliability estimate is the one obtained for the 
sample used in the [present] study under 
consideration . 

Unhappily, empirical studies indicate that such reports are 
infrequent (Meier & Davis, 1990; Willson, 1980) in most journals, 
although there are exceptions (Thompson & Snyder, in press) . 

In her important paper proposing "reliability generalization" 
methods to characterize (a) the mean and (b) the standard deviation 
of score reliabilities for a given instrument across studies, and 
to explore (c) the sources of variability in score reliabilities, 
Vacha-Haase noted a benefit from the routine reporting of score 
reliability even in substantive studies: 

Furthermore, if authors of empirical studies 
routinely report reliability coefficients, even in 
substantive studies, the field will cumulate more 
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evidence regarding the psychometric integrity of 

scores. Such practices would provide more fodder for 

reliability generalization analyses focusing upon 

the differential influences of various sources of 

measurement error. (Vacha-Haase, 1998a, p. 14) 

Interpret Results in a Reliability Context . Effect sizes can 

and should be computed in all studies; Kirk (1996) and Snyder and 

Lawson (1993) provide excellent reviews of the many options. When 

and if these effects are deemed (a) noteworthy in magnitude and (b) 

replicable, then (and only then) these effect sizes should also be 

interpreted. 

Score reliability is one of the several study features that 
impact detected effects. Score measurement errors always attenuate 
computed effects to some degree (Schneider & Darcy, 1984) . This 
attenuation ought to be considered when interpreting reported 
effects. As I have noted elsewhere. 

The failure to consider score reliability in 
substantive research may exact a toll on the 
interpretations within research studies. For 
example, we may conduct studies that could not 
possibly yield noteworthy effect sizes, given that 
score reliability inherently attenuates effect 
sizes. Or we may not accurately interpret the 
effect sizes in our studies if we do not consider 
the reliability of the scores we are actually 
analyzing. (Thompson, 1994c, p. 840) 
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Error #5: Incorrectly Interpreting Statistical Significance: 
Failing to Report Effect Sizes 

As Pedhazur and Schmelkin (1991) noted, "probably very few 
methodological issues have generated as much controversy" (p. 198) 
as have the use and interpretation of statistical significance 
tests. These tests have proven surprisingly resistant to repeated 
efforts "to exorcise the null hypothesis" (Cronbach, 1975, p. 124). 
Especially noteworthy among the historical efforts to accomplish 
the exorcism have been works by Rozeboom (1960) , Morrison and 
Henkel (1970), Carver (1978), Meehl (1978), Shaver (1985), and 
Oakes (1986). 

More recently, a seemingly periodic series of articles on the 
extraordinary limits of statistical significance tests has been 
published in the American Psychologist (cf. Cohen, 1990, 1994; 
Kupfersmid, 1988; Rosenthal, 1991; Rosnow & Rosenthal, 1989). The 
entire Volume 61, Number 4 issue of the Journal of Experimental 
Education was devoted to these themes. Schmidt's (1996) APA 
Division 5 presidential address was published as the lead article 
in the second issue of the inagural volume of the new APA journal, 
Psvcholgical Methods . The lead section (cf. Hunter, 1997) of the 
January, 1997 issue of Psychological Science was devoted to this 
controversy. The April, 1998 issue of Educational and 
Psychological Measurement featured two lengthy reviews (Levin, 
1998; Thompson, 1998) of a major text (Harlow, Mulaik & Steiger, 
1997) on the controversy. And the APA Task Force on Statistical 
Inference (Shea, 1996) has now been working for nearly two years on 
related recommendations for improving practices. 
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Illustrative condemnations of contemporary statistical testing 
practices can be noted. For example, Schmidt and Hunter (1997) 
recently argued that "Statistical significance testing retards the 
growth of scientific knowledge; it never makes a positive 
contribution" (p. 37). Rozeboom (1997) was equally direct: 

Null -hypothesis significance testing is surely the 
most bone-headedly misguided procedure ever 
institutionalized in the rote training of science 
students... [I]t is a sociology-of-science 
wonderment that this statistical practice has 
remained so unresponsive to criticism. . . (p. 335) 

But, without much question, two articles by the late Jacob 
Cohen (1990, 1994) have been the most influential. Roger Kirk 
(1996) characterized the two American Psychologist articles by 
Cohen as "classics," and argued that "the one individual most 
responsible for bringing the shortcomings of hypothesis testing to 
the attention of behavioral and educational researchers is Jacob 
Cohen" (p. 747) . 

This onslaught of criticism has provoked reactive advocacy for 
statistical tests (cf. Cortina & Dunlap, 1997; Frick, 1996; 
Greenwald, Gonzalez, Harris & Guthrie, 1996; Hagen, 1997; Robinson 
& Levin, 1997) . Some of these treatments have been thoughtful, but 
others have been seriously flawed (see Thompson, in press-c, in 
press-d) . 

Yet, notwithstanding the long-term availability of these many 
publications, even today some researchers still do not understand 
what their statistical significance tests do and do not do. 
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Empirical studies of researcher perceptions of test results confirm 
that researchers manifest these misconceptions (cf. Nelson, 
Rosenthal & Rosnow, 1986; Oakes, 1986; Rosenthal & Gaito, 1963; 
Zuckerman, Hodgins, Zuckerman & Rosenthal, 1993) . Similarly, 
content reviews of the most widely-used statistics textbooks show 
that even our most distinguished methodologists do not have a good 
grasp on the meaning of statistical significance tests (Carver, 
1978) . 

My own views have been articulated in various locations (e.g., 
Thompson, 1993, 1994d, 1997a, in press-a, in press-d) . I believe 
that three other essays (Thompson, 1996, 1998, in press-b) are 
particularly noteworthy. And a short, public-domain ERIC Digest I 
published (Thompson, 1994b) may be very useful as a class handout. 

I have never argued that significance tests should be banned, 
though obviously others have argued that view (cf. Carver, 1978; 
Schmidt & Hunter, 1997) . As an author, I do report (without much 
excitement) the results of statistical significance tests. As an 
editor of three journals, I have accepted for publication 
manuscripts that report these tests. 

Common Misconceptions Regarding Statistical Tests 

In various locations I have criticized common misconceptions 
regarding the meaning and value of statistical tests (cf. Thompson, 
1996, in press-b). Three of these I now briefly summarize here. 

Statistical Significance Does Not Test Result Importance . Put 
simply, improbable events are not intrinsically interesting. Some 
highly improbable events, in fact, are completely inconsequential. 
In his classic hypothetical dialogue between two teachers. Shaver 
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(1985, p. 58) poignantly illustrated the folly of equating result 

improbability with result importance: 

Chris: ...I set the level of significance at .05, as my 

advisor suggested. So a difference that large 

would occur by chance less than five times in a 

hundred if the groups weren't really different. 

An unlikely occurrence like that surely must be 

important . 

Jean: Wait a minute, Chris. Remember the other day when 
you went into the office to call home? Just as 
you completed dialing the number, your little boy 
picked up the phone to call someone. So you were 
connected and talking to one another without the 
phone ever ringing... Well, that must have been a 
truly important occurrence then? 

Even more importantly, since the premises of statistical 
significance tests do not invoke human values, in valid logical 
argument statistical results therefore can not under any 
circumstances contain as part of their conclusions information 
about result value. As I have noted previously, "If the computer 
package did not ask you your values prior to its analysis, it could 
not have considered your value system in calculating p's, and so 
P's cannot be blithely used to infer the value of research results" 
(Thompson, 1993, p. 365). Thus, statistical tests cannot reasonably 
be used as an atavistic escape from responsibility for defending 
result importance (Thompson, 1993) , or to maintain a mantle of 
feigned objectivity (Thompson, in press-b) . 
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Statistical Significance Does Not Test Result Replicability . 
Social scientists seek to identify relationships that recur under 
stated conditions. Discovering analogs of cold fusion will make us 
extremely popular (free drinks, much dancing, etc.) at our next 
scholarly meeting, but we will eternally thereafter be shunned (no 
one will accept the drinks we attempt to buy for them, so much for 
the dancing, etc.) at all future conferences, once our results are 
discovered to be non-replicable. [So, only report non-replicable 
results at your last conference, immediately prior to retirement.] 
Too many researchers, consciously or unconsciously, 
incorrectly assume that the p values calculated in statistical 
significance tests evaluate the probability that results will 
replicate (Carver, 1978, 1993). But statistical tests do not 

evaluate the probability that the sample statistics occur in the 
population as parameters (Cohen, 1994) . 

Instead, "Pcalculated probability (0 to 1.0) of the sample 

statistics, given the sample size, and assuming the sample was 
derived from a population in which the null hypothesis (Hq) is 
exactly true" (Thompson, 1996, p. 27) . Obviously, knowing the 
probability of the sample is less interesting than knowing the 
probability of the population. Knowing the probability of 
population parameters would bear upon result replicability, since 
we would then know something about the population from which future 
researchers would also draw their samples. 

But as Shaver (1993) argued so emphatically: 

[A] test of statistical significance is not an 
indication of the probability that a result would be 
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obtained upon replication of the study.... Carver's 
(1978) treatment should have dealt a death blow to 
this fallacy. . . . (p. 304) 

And so Cohen (1994) concluded that the statistical significance 
test "does not tell us what we want to know, and we so much want to 
know what we want to know that, out of desperation, we nevertheless 
believe that it does!" (p. 997). 

Statistical Significance Does Not Solely Evaluate Effect 
Magnitude . Because various study features (including score 
reliability) impact calculated p values, Pcalcuiated cannot be used as 
a satisfactory index of study effect size. As I have noted 
elsewhere. 

The calculated p values in a given study are a 
function of several study features, but are 
particularly influenced by the confounded, joint 
influence of study sample size and study effect 
sizes. Because p values are confounded indices, in 
theory 100 studies with varying sample sizes and 100 
different effect sizes could each have the same 
single Pcalcuiated» 100 studies with the same single 
effect size could each have 100 different values for 
Ecalculated- (Thompson, in press-b) 

The recent fourth edition of the American Psychological 
Association style manual (APA, 1994) explicitly acknowledges that 
p values are not acceptable indices of effect: 

Neither of the two types of probability values 
[statistical significance tests] reflects the 
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importance or magnitude of an effect because both 
depend on sample size. . . You are [therefore] 
encouraged to provide effect-size information. (APA, 

1994, p. 18, emphasis added) 

Recommended Improvements in Statistical Testing Practices 

In various locations (cf. Thompson, 1996, in press-b) I have 
advocated certain changed practices as regards the use of 
statistical tests. Five such suggested changes are now summarized 
here. 

Effect Sizes Should Be Reported for All Tested Effects . The 
single most important potential improvement in analytic practice 
would be the regular and routine reporting of effect sizes in all 
studies. As noted previously, such reports are at least 
"encouraged" by the new APA (1994, p. 18) style manual. 

However, empirical studies of articles published since 1994 in 
psychology, counseling, special education, and general education 
suggest that merely "encouraging" effect size reporting (APA, 1994) 
has not appreciably affected actual reporting practices (e.g., 
Kirk, 1996; Snyder & Thompson, in press; Thompson & Snyder, 1997, 
in press; Vacha-Haase & Nilson, in press) . An on-going series of 
additional empirical studies of reporting practices has yielded 
similar results for yet more journals (Lance & Vacha-Haase, 1998; 
Ness & Vacha-Haase, 1998; Nillson & Vacha-Haase, 1998; Reetz & 
Vacha-Haase, 1998) . 

Effect sizes are important to report for at least two reasons. 
First, when these effects are noteworthy, these indices inform 
judgment regarding the practical or substantive significance of 
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results (cf. Kirk, 1996). Second, reporting all effect sizes (even 

non-statistically significant effects, though some might not 

interpret them) facilitates the meta-analytic integration of 

findings across a given literature. 

There are many effect sizes (e.g., "uncorrected, " "corrected,” 
standardized differences) that can be computed (cf. Kirk, 1996; 
Snyder & Lawson, 1993) . In my view (Thompson, in press-b) , 
arguments can be made that certain indices should be preferred over 
others. But the important point is that, as regards effect size 
reporting, it is generally better to report anything as against 
nothing, which is the effect size that most researchers currently 
report. 

Of course, an effect size is no more magical than is 
statistical significance testing, for the two reasons noted by 
Zwick (1997) . First, because human values are also not part of the 
calculation of an effect size, any more than values are part of the 
calculation of p, "largeness of effect does not guarantee practical 
importance any more than statistical significance does" (p. 4) . 

Second, some researchers have too rigidly adopted Cohen's 
(1988) definitions of small, medium and large effects, just as some 
researchers too rigidity adopted "a=.05" as their gold standard. 
Cohen (1988) only intended these as impressionistic 
characterizations of result typicality across a diverse literature. 
However, some empirical studies do suggest that the 
characterization is reasonably accurate (Glass, 1979; Olejnik, 
1984), at least as regards a literature historically built with a 
bias against statistically non-significant results (Rosenthal, 
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In my view, editorial requirements (Vacha-Haase, 1998b) will 
ultimately be required to move the field to change analytic and 
reporting practices. Fortunately, editorial policies at some 
journals now require authors to report and interpret effect sizes. 
For example, the author guidelines of the Journal of Experimental 
Education indicate that "authors are required to report and 
interpret magnitude-of-ef feet measures in conjunction with every p 
value that is reported" (Heldref Foundation, 1997, pp. 95-96, 
emphasis added) . I believe the EPM author guidelines are equally 
informed; 

We will go further [than mere encouragement] . 

Authors reporting statistical significance will be 
required to both report and interpret effect sizes. 
However, these effect sizes may be of various forms, 
including standardized differences, or uncorrected 
(e.g., r^, R^, eta^) or corrected (e.g., adjusted R^, 
omega^) variance-accounted-f or statistics. (Thompson, 

1994c, p. 845, emphasis in original) 

It is particularly noteworthy that editorial policies even at 
one APA journal now indicate that: 

If an author decides not to present an effect size 
estimate along with the outcome of a significance 
test, I will ask the author to provide specific 
justification for why effect sizes are not reported. 

So far, I have not heard a good argument against 
presenting effect sizes. Therefore, unless there is 
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a real impediment to doing so, you should routinely 
include effect size information in the papers you 
submit. (Murphy, 1997, p. 4) 

Researchers Should More Frequently Employ Non-Nill Nulls . An 
important but overlooked (see Hagen, 1997; Thompson, in press-c) 
element of Cohen's (1994) classic article involved his striking 
criticism of the routine use of "nil" null hypotheses. Cohen (1994) 
defined a "nil" null hypothesis as a null specifying no differences 
(e.g., SDj-SDj = 0) or zero correlations (e.g., R^=0) . 

Some researchers employ nil nulls because statistical theory 
does not easily accommodate the testing of some non-nil nulls. But 
in other cases researchers employ nil nulls because these nulls 
have been unconsciously accepted as traditional, because these 
nulls can be mindlessly formulated without consulting previous 
literature, or because most computer software defaults to tests of 
nil nulls (Thompson, 1998, in press-b, in press-c). 

Unfortunately, when a statistical significance test presumes 
a nil null is true in the population, an untruth is posited. As 
Meehl (1978, p. 822) noted, "As I believe is generally recognized 
by statisticians today and by thoughtful social scientists, the 
null hypothesis, taken literally, is always false." Similarly, 
Hays (1981, p. 293) pointed out that "[tjhere is surely nothing on 
earth that is completely independent of anything else [in the 
population] . The strength of association may approach zero, but it 
should seldom or never be exactly zero." 

Highly respected statistician Roger Kirk (1996) put the point 
succinctly in his important recent article: 
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Because the null hypothesis is always false, a 
decision to reject it simply indicates that the 
research design had adequate power to detect a true 
state of affairs, which may or may not be a large 
effect or even a useful effect. It is ironic that a 
ritualistic adherence to null hypothesis 
significance testing has led researchers to focus on 
controlling the Type I error that cannot occur 
because all null hypotheses are false, (p. 747, 
emphasis added) 

And a Pcalcuuvted value computed on the foundation of a false premise 
is inherently of somewhat limited utility. 

There is a very important implication of the realization that 
the nil null is untrue in the population. As Hays (1981, p. 293) 
emphasized, because the nil null is untrue in the population, 
sample statistics should reflect some difference or some effect, 
and thus "virtually any study can be made to show significant 
results if one uses enough subjects." This means that 

Statistical significance testing can involve a 
tautological logic in which tired researchers, 
having collected data from hundreds of subjects, 
then conduct a statistical test to evaluate whether 
there were a lot of subjects, which the researchers 
already know, because they collected the data and 
know they're tired. (Thompson, 1992c, p. 436) 

Statistical significance would be considerably more informative if 
researchers reviewed relevant previous research, and then 
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constructed hypotheses that incorporated previous results. 

Measurement Results Should be Tested with Non-Nil Nulls . There 
is growing recognition that some uses of statistical tests in 
measurement studies, as regards reliability or validity 
coefficients or construct validity tests of means, can be 
particularly misguided. For example, Abelson (1997) commented on 
statistical tests of measurement study results using nil null 
hypotheses : 

And when a reliability coefficient is declared to be 
nonzero, that is the ultimate in stupefy ingly 
vacuous information. What we really want to know is 
whether an estimated reliability is .50'ish or 
.80'ish. (Abelson, 1997, p. 121) 

Fortunately, the author guidelines of some journals have become 
more enlightened as regards such practices: 

Statistical tests of such coefficients in a 
measurement context make little sense. Either 
statistical significance tests using the [nil] null 
hypothesis of zero magnitude should be by-passed, or 
meaningful null hypotheses should be employed. 
(Thompson, 1994c, p. 844) 

Researchers Should Provide Some Warrant That Results Are 
Replicable . Because evidence of result replicability is important 
(if we take science to be the business of cumulating knowledge 
across studies) , because statistical significance tests do not 
evaluate result replicability (Cohen, 1994; Thompson, 1996, 1997b), 
other methods must and should be used for this purpose. It has been 
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As more researchers finally realize that statistical 
significance tests do not test the population, and 
therefore do not test replicability, researchers 
will increasingly emphasize evidence that instead is 
relevant to the issue of result replicability. 
(Vacha-Haase & Thompson, in press) 

Many warrants are available, and in fact a single study might 
present several such warrants. 

The most persuasive, and perhaps the only conclusive, evidence 
for result replicability is to actually replicate the study. And 
replication studies are important, and probably are somewhat 
undervalued in the social sciences (Robinson & Levin, 1997) . 
However, many researchers (especially doctoral students working on 
dissertations and junior faculty seeking tenure) find themselves 
unable to replicate every study. 

One potential warrant for replicability would involve 
prospectively formulating null hypotheses by reflectively 
consulting the effect sizes reported in previous related studies, 
and by prospectively interpreting study effects in the context of 
specific previous findings. In effect, virtually any study might be 
conducted and interpreted as a partial replication of previous 
inquiry. Another alternative warrant involves empirical 
investigation of replicability by conducting what I have termed 
(cf. Thompson, 1996) "internal” replicability analyses. 

"Internal" replicability analyses empirically use the sample 
in hand to combine the participants in different ways to estimate 
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how much the idiosyncracies of individuality within the sample have 

compromised generalizability . The major "internal” empirical 

replicability analyses are cross-validation, the jackknife, and the 

bootstrap (Diaconis & Efron, 1983) ; the logics are reviewed in more 

detail elsewhere (cf. Thompson, 1993, 1994d) • "Internal" evidence 

for replicability is never as good as an actual replication 

(Robinson & Levin, 1997; Thompson, 1997a), but is certainly better 

than incorrectly presuming that statistical significance assures 

result replicability. 

However, it must be emphasized that the inferential and the 
descriptive uses of these logics should not be confused (Thompson, 
1993) . For example, the inferential use of the bootstrap involves 
using the bootstrap to estimate a sampling distribution when the 
sampling distribution is not known or assumptions for the use of a 
known sampling distribution cannot be met (i.e., to conduct a 
different form of statistical significance test) . The descriptive 
use of the bootstrap looks primarily at the variability in effect 
sizes or other parameter estimates across many different 
combinations of the participants. The software to conduct 
"internal" bootstrap analyses for statistics commonly used in the 
social sciences (cf. Elmore & Woehlke, 1988; Goodwin & Goodwin, 
1985) is already widely available (e.g., Lunneborg (1987) for 
univariate applications, and Thompson (1988b, 1992a, 1995a) for 

multivariate applications) . 

Improved Language Use . In Thompson (1996), I suggested that 
when the null hypothesis is rejected, "such results ought to always 
be described as 'statistically significant,' and should never be 
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described only as 'significant'” (pp. 28-29). My argument 

(Thompson, 1996, 1997a; but see Robinson & Levin, 1997) has been 

that the common meaning of "significant” has nothing to do with the 

statistical use of this term, and that the use of the complete 

phrase might help at least some in conveying that this technical 

phrase has nothing to do with result importance. 

Carver (1993) eloquently made the same argument: 

When trying to emulate the best principles of 
science, it seems important to say what we mean and 
to mean what we say. Even though many readers of 
scientific journals know that the word significant 
is supposed to mean statistically significant when 
it is used in this context, many readers do not know 
this. Why be unnecessarily confusing when clarity 
should be most important? (p. 288, emphasis in 

original) 



Summary 

After presenting a general linear model as a framework for 
discussion, the present paper reviewed five methodology errors that 
occur in educational research: (a) the use of stepwise methods; (b) 
the failure to consider in result interpretation the context 
specificity of analytic weights (e.g., regression beta weights, 
factor pattern coefficients, discriminant function coefficients, 
canonical function coefficients) that are part of all parametric 
quantitative analyses; (c) the failure to interpret both weights 
and structure coefficients as part of result interpretation; (d) 
the failure to recognize that reliability is a characteristic of 
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scores, and not of tests; and (e) the incorrect interpretation of 
statistical significance and the related failure to report and 
interpret the effect sizes present in all quantitative analyses. In 
several cases small heuristic discriminant analysis data sets were 
presented to make more concrete and accessible the discussion of 
each of these five methodology errors. 

However, of the various arenas for improvement, the one where 
I believe the most progress could be realized involves the use of 
statistical significance tests and the reporting of effect sizes. 
Yet this is where the most resistance has seemingly occurred. For 
example, Schmidt and Hunter (1997) recently argued that "logic- 
based arguments seem to have had only a limited impact. . . [perhaps 
due to] the virtual brainwashing in significance testing that all 
of us have undergone” (pp. 38-39) . They also spoke of a "psychology 
of addiction to significance testing” (Schmidt & Hunter, 1997, p. 
49) . 

Journal editor Loftus (1994) , like others, has lamented that 
repeated publications of 

these concerns never seem to attract much attention 
(much less impel action) . They are carefully crafted 
and put forth for consideration, only to just kind 
of dissolve away in the vast acid bath of our 
existing methodological orthodoxy, (p. 1) 

Another editor commented: "p values are like mosquitos” that 
apparently "have an evolutionary niche somewhere and 
[unfortunately] no amount of scratching, swatting or spraying will 
dislodge them” (Campbell, 1982, p. 698). 
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Similar comments have been made by non-editors. For example, 
Falk and Greenbaum (1995) noted that "A massive educational effort 
is required to... extinguish the mindless use of a procedure that 
dies hard" (p. 94) . And Harris (1991) observed, "it is surprising 
that the dragon will not stay dead" (p. 375) . 

Fortunately, some slow, glacial progress in the incremental 
movement of the field was reflected in the APA (1994, p. 18) style 
manual "encouraging" the reporting of effect sizes. But enlightened 
editorial policies (e.g., Heldref Foundation, 1997; Murphy, 1997; 
Thompson, 1994c) now provide the strongest basis for cautious 
optimism. 
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Table 1 

Correlation Coefficients for Selected 
Holzinger and Swineford (1939) Data Used to Illustrate That 
SEM is the Most General Case of the General Linear Model 





T6 


T7 


T2 


T4 


T20 


T21 


T22 


T6 


1.0000 


.7332 


1 .1529 


.1586 


.3440 


.3206 


.4476 


T7 


.7332 


1.0000 


! .1394 


.0772 


.3367 


.3020 


.4698 


T2 


. 1529 


.1394 


1 1.0000 


. 3398 


.2812 


.2433 


.2812 


T4 


.1586 


.0772 


j .3398 


1.0000 


.3243 


. 3310 


.3062 


T20 


.3440 


.3367 


j .2812 


.3243 


1.0000 


. 3899 


.3947 


T21 


.3206 


.3020 


j .2433 


.3310 


.3899 


1.0000 


.3767 


T22 


. 4476 


.4698 


! .2812 


.3062 


.3947 


. 3767 


1.0000 



Note . The variable labels for these seven variables are: 

T6 PARAGRAPH COMPREHENSION TEST 
T7 SENTENCE COMPLETION TEST 

T2 CUBES, SIMPLIFICATION OF BRIGHAM'S SPATIAL RELATIONS TEST 

T4 LOZENGES FROM THORNDIKE — SHAPES FLIPPED OVER THEN IDENTIFY TARGET 

T20 DEDUCTIVE MATH ABILITY 

T21 MATH NUMBER PUZZLES 

T22 MATH WORD PROBLEM REASONING 



Table 2 

Standardized Canonical Function Coefficients for the Table l Data 
Derived Using the Appendix A SPSS/LISREL Program to Illustrate That 
SEM is the Most General Case of the General Linear Model 



Standardized canonical coefficients for DEPENDENT variables 



Variable 


1 


2 


T6 


.44962 


-1.40007 


T7 


.62246 


1.33225 



Standardized 


canonical coefficients 


for COVARIATES 


COVARIATE 


1 


2 


T2 


-.01468 


.06704 


T4 


-.20012 


-1.00653 


T20 


. 34100 


-.02762 


T21 


.26772 


-.17401 


T22 


.73104 


.35974 
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Table 3 

LISREL "Gamma" Coefficients for the Table 1 Data 
Derived Using the Appendix A SPSS/LISREL Program to Illustrate That 
SEM is the Most General Case of the General Linear Model 



GAMMA 


T6 


T7 








ETA 1 


0.44957 


0.62250 








GAMMA 


T2 


T4 


T20 


T21 


T22 


ETA 1 


-0.01468 


-0.20014 


0.34100 


0.26772 


0.73104 


GAMMA 


T6 


T7 








ETA 1 


0.44956 


0.62251 








ETA 2 


1.40013 


-1.33228 








GAMMA 


T2 


T4 


T20 


T21 


T22 


ETA 1 


-0.01469 


-0.20014 


0.34101 


0.26771 


0.73104 


ETA 2 


-0.06706 


1.00653 


0.02762 


0.17402 


-0.35972 



Note . The LISREL coefficients for the "gamma” matrix exactly match 
(within rounding error) the canonical function coefficients 
presented previously. The only exception is that all the signs for 
the SEM second canonical function coefficients must be "reflected." 
"Reflecting" a function (changing all the signs on a given 
function, factor, or equation) is always permissible, because the 
scaling of psychological constructs is arbitrary. Thus, the SEM and 
the canonical analysis derived the same results. 
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Table 4 

The Confusing Language of Statistics 
(Intentionally Designed to Confuse the Graduate Students) 



Anal vs is 


Standardized 

Weiahts* 


Weight 

Svstem 


Snythetic/ 
Latent 
Variable fs^ 


Multiple 

Regression 




"equation" 


Yhat (Y) 


Factor 

Analysis 


pattern 

coefficients 


"factor" 


factor 

scores 


Descriptive 

Discriminant 

Analysis 


standardized 

function 

coefficients 


" function" 
-or- 
"rule" 


discriminant 

function 

scores 


Canonical 

Correlation 

Analysis 


standardized 

function 

coefficients 


"function" 


canonical 

function 

scores 



•Of course, the term, "standardized weight", is an obvious oxymoron. 
A given weight is a constant applied to all the scores of all the 
cases/people on the observed/manifest/ measured variable, and 
therefore cannot be standardized. Instead, the weighting constant 
is applied to the measured variable in its standardized form, i.e., 
we should say "weight for the standardized measured variables" 
rather than "standardized weight". 
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Table 5 

Holzinger and Swineford Data to Show That 
More Predictors May Actually Hurt Classification Accuracy 



eq 


ID 


GRADE 


T13 


T17 


T22 


T16 


1 


2 


7 


285 


12 


21 


100 


2 


3 


7 


159 


1 


18 


95 


3 


9 


7 


265 


18 


18 


105 


4 


14 


7 


211 


8 


22 


103 


5 


16 


7 


211 


5 


34 


102 


6 


18 


7 


189 


13 


16 


100 


7 


20 


7 


207 


3 


47 


107 


8 


22 


7 


194 


8 


19 


96 


9 


25 


7 


244 


6 


20 


99 


10 


28 


7 


163 


12 


24 


106 


11 


30 


7 


310 


10 


20 


101 


12 


34 


7 


121 


3 


18 


92 


13 


44 


7 


167 


11 


22 


112 


14 


46 


7 


100 


4 


25 


58 


15 


47 


7 


240 


6 


20 


103 


16 


50 


7 


226 


4 


39 


109 


17 


51 


7 


196 


8 


18 


96 


18 


52 


7 


218 


7 


18 


92 


19 


58 


7 


151 


15 


25 


102 


20 


66 


7 


142 


3 


13 


95 


21 


68 


7 


172 


10 


32 


110 


22 


71 


7 


181 


9 


27 


107 


23 


74 


7 


153 


15 


21 


99 


24 


75 


7 


141 


14 


19 


107 


25 


76 


7 


195 


10 


19 


103 


26 


78 


7 


186 


7 


30 


109 


27 


79 


7 


215 


10 


15 


103 


28 


81 


7 


165 


11 


22 


108 


29 


83 


7 


233 


2 


28 


100 


30 


85 


7 


203 


8 


24 


103 


31 


202 


7 


195 


9 


22 


106 


32 


203 


7 


228 


1 


43 


101 


33 


205 


7 


160 


9 


35 


99 


34 


208 


7 


333 


16 


45 


118 


35 


213 


7 


154 


3 


19 


106 


36 


225 


7 


236 


21 


29 


116 


37 


226 


7 


219 


6 


23 


104 


38 


230 


7 


189 


1 


7 


99 


39 


232 


7 


143 


2 


27 


94 


40 


235 


7 


162 


3 


16 


100 


41 


236 


7 


205 


6 


27 


101 


42 


239 


7 


112 


3 


18 


90 


43 


244 


7 


137 


0 


24 


105 


44 


245 


7 


214 


4 


26 


100 


45 


250 


7 


120 


3 


28 


112 


46 


252 


7 


165 


1 


10 


101 
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253 


7 


137 


1 


15 


89 


256 


7 


214 


4 


28 


97 


257 


7 


223 


5 


23 


106 


263 


7 


205 


5 


35 


103 


264 


7 


180 


6 


36 


97 


268 


7 


130 


3 


14 


103 


269 


7 


220 


4 


31 


113 


277 


7 


149 


1 


21 


96 


86 


8 


207 


19 


37 


112 


88 


8 


217 


24 


20 


106 


89 


8 


191 


10 


27 


109 


90 


8 


208 


9 


17 


98 


106 


8 


260 


17 


41 


104 


112 


8 


148 


11 


34 


105 


118 


8 


271 


11 


34 


113 


120 


8 


175 


10 


24 


111 


126 


8 


180 


11 


21 


96 


131 


8 


247 


20 


26 


101 


132 


8 


119 


2 


28 


91 


133 


8 


234 


14 


44 


113 


134 


8 


172 


23 


26 


99 


137 


8 


177 


11 


25 


93 


139 


8 


208 


18 


34 


107 


140 


8 


227 


9 


13 


108 


143 


8 


259 


16 


23 


107 


148 


8 


196 


7 


39 


96 


150 


8 


248 


17 


32 


110 


151 


8 


255 


26 


34 


112 


153 


8 


206 


11 


16 


105 


155 


8 


238 


16 


49 


102 


158 


8 


227 


18 


15 


101 


160 


8 


197 


6 


25 


100 


165 


8 


195 


9 


29 


91 


282 


8 


241 


1 


27 


115 


283 


8 


230 


4 


26 


103 


284 


8 


200 


11 


8 


108 


285 


8 


246 


16 


33 


109 


287 


8 


227 


11 


48 


109 


288 


8 


168 


11 


28 


104 


289 


8 


224 


13 


43 


104 


290 


8 


189 


7 


38 


110 


297 


8 


199 


8 


30 


108 


298 


8 


249 


15 


50 


119 


299 


8 


212 


7 


29 


102 


304 


8 


210 


5 


27 


104 


311 


8 


198 


7 


34 


107 


312 


8 


237 


6 


18 


108 


313 


8 


206 


15 


50 


107 


315 


8 


215 


5 


27 


101 


317 


8 


183 


9 


18 


113 


318 


8 


187 


8 


35 


109 


322 


8 


220 


7 


26 


109 


323 


8 


178 


8 


27 


103 
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47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 



100 


324 


8 


150 


6 


101 


329 


8 


235 


6 


102 


338 


8 


206 


26 


103 


341 


8 


174 


7 


104 


342 


8 


162 


9 


105 


343 


8 


228 


1 


106 


345 


8 


204 


7 


107 


351 


8 


186 


25 
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8 102 
18 101 
37 113 
46 105 
29 96 

39 104 
25 112 
39 109 



Note . The variable labels are: 

T13 SPEEDED DISCRIM STRAIGHT AND CURVED CAPS 
T17 MEMORY OF OBJECT-NUMBER ASSOCIATION TARGETS 
T22 MATH WORD PROBLEM REASONING 
T16 MEMORY OF TARGET SHAPES 
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Table 6 

Holzinger and Swineford Results to Show That 
More Predictors May Actually Hurt Classification Accuracy 
— LDF and LCF Score Classification Tables — 



GRADE by 



LDFCL3 LDF classification 
Count I 



3 predictors 



GRADE 





I 






Row 




I 


71 


81 


Total 




-+- 


+ - 


+ 




7 


I 


401 


141 


54 




I 


I 


I 


50.5 




+ - 


+ - 


+ 




8 


I 


221 


311 


53 




I 


I 


I 


49.5 




+ - 


+ - 


+ 




Column 




62 


45 


107 


Total 




57.9 


42.1 


100.0 



GRADE by LDFCL4 LDF classification 
Count I 



4 predictors 



GRADE 





I 






Row 




I 


71 


81 


Total 




-+- 


+ - 


+ 




7 


I 


381 


161 


54 




I 


I 


I 


50.5 




+- 


+ - 


+ 




8 


I 


231 


301 


53 




I 


I 


I 


49.5 




+ - 


+ - 


+ 




Column 




61 


46 


107 


Total 




57.0 


43.0 


100.0 



O 

ERIC 
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GRADE by 



GRADE 



LCFCL3 LCF classification 3 
Count I 



I 





I 






Row 




I 


71 


81 


Total 






+ - 


+ 




7 


I 


401 


141 


54 




I 


I 


I 


50.5 




+- 


+ - 


+ 




8 


I 


221 


311 


53 




I 


I 


I 


49.5 




+- 


+ - 


+ 




Column 




62 


45 


107 


Total 




57.9 


42.1 


100.0 



GRADE by 



GRADE 



LCFCL4 LCF classification 4 
Count I 



I 





I 






Row 




I 


71 


81 


Total 




-+ 


+ - 


+ 




7 


I 


381 


161 


54 




I 


I 


I 


50.5 




+ 


+ - 


+ 




8 


I 


231 


301 


53 




I 


I 


I 


49.5 




+ 


+ - 


+ 




Column 




61 


46 


107 


Total 




57.0 


43.0 


100.0 




predictors 



predictors 
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Table 7 

Holzinger and Swineford Results to Show That 
More Predictors May Actually Hurt Classification Accuracy 
— Both LDF and LCF Actual Classifications — 



Sea 


ID 


GRADE 


LDFCL3 


LDFCL4 


LCFCL3 


LCFCL4 


1 


2 


7 


8 


8 


8 


8 


2 


3 


7 


7 


7 


7 


7 


3 


9 


7 


8 


8 


8 


8 


4 


14 


7 


7 


7 


7 


7 


5 


16 


7 


7 


7 


7 


7 


6 


18 


7 


7 


7 


7 


7 


7 


20 


7 


8 


8 


8 


8 


8 


22 


7 


7 


7 


7 


7 


9 


25 


7 


7 


7 


7 


7 


10 


28 


7 


8 


8 


8 


8 


11 


30 


7 


+ 8 


7 


8 


7 


12 


34 


7 


7 


7 


7 


7 


13 


44 


7 


7 


8 


7 


8 


14 


46 


7 


7 


7 


7 


7 


15 


47 


7 


7 


7 


7 


7 


16 


50 


7 


8 


8 


8 


8 


17 


51 


7 


7 


7 


7 


7 


18 


52 


7 


7 


7 


7 


7 


19 


58 


7 


8 


8 


8 


8 


20 


66 


7 


7 


7 


7 


7 


21 


68 


7 


8 


8 


8 


8 


22 


71 


7 


7 


8 


7 


8 


23 


74 


7 


8 


8 


8 


8 


24 


75 


7 


8 


8 


8 


8 


25 


76 


7 


7 


7 


7 


7 


26 


78 


7 


7 


8 


7 


8 


27 


79 


7 


7 


7 


7 


7 


28 


81 


7 


7 


8 


7 


8 


29 


83 


7 


7 


7 


7 


7 


30 


85 


7 


7 


7 


7 


7 


31 


202 


7 


7 


7 


7 


7 


32 


203 


7 


7 


7 


7 


7 


33 


205 


7 


8 


8 


8 


8 


34 


208 


7 


8 


8 


8 


8 


35 


213 


7 


7 


7 


7 


7 


36 


225 


7 


8 


8 


8 


8 


37 


226 


7 


7 


7 


7 


7 


38 


230 


7 


7 


7 


7 


7 


39 


232 


7 


7 


7 


7 


7 


40 


235 


7 


7 


7 


7 


7 


41 


236 


7 


7 


7 


7 


7 


42 


239 


7 


7 


7 


7 


7 


43 


244 


7 


7 


7 


7 


7 


44 


245 


7 


7 


7 


7 


7 


45 


250 


7 


7 


7 


7 


7 



ERIC 
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46 


252 


47 


253 


48 


256 


49 


257 


50 


263 


51 


264 


52 


268 


53 


269 


54 


277 


55 


86 


56 


88 


57 


89 


58 


90 


59 


106 


60 


112 


61 


118 


62 


120 


63 


126 


64 


131 


65 


132 


66 


133 


67 


134 


68 


137 


69 


139 


70 


140 


71 


143 


72 


148 


73 


150 


74 


151 


75 


153 


76 


155 


77 


158 


78 


160 


79 


165 


80 


282 


81 


283 


82 


284 


83 


285 


84 


287 


85 


288 


86 


289 


87 


290 


88 


297 


89 


298 


90 


299 


91 


304 


92 


311 


93 


312 


94 


313 


95 


315 


96 


317 


97 


318 


98 


322 



ERIC 



7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


8 


7 


8 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


7 


7 


7 


7 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


7 


8 


7 


8 


7 


7 


7 


7 


8 


8 


8 


8 


7 


7 


7 


7 


8 


8 


8 


8 


8 


8 


8 


8 


8 


7 


8 


7 


8 


8 


8 


8 


7 


7 


7 


7 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


7 


7 


7 


7 


8 


8 


8 


8 


8 


8 


8 


8 


7 


7 


7 


7 


8 


7 


8 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


7 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


8 


7 


7 


7 


7 


7 


7 


7 


7 


8 


8 


8 


8 


7 


7 


7 


7 


8 


8 


8 


8 


7 


7 


7 


7 


7 


7 


7 


7 


8 


8 


8 


8 


7 


7 


7 


7 



72 



7 

7 

7 

7 

7 

7 

7 

7 

7 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 

8 
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99 


323 


8 


7 


7 


7 


7 


100 


324 


8 


7 


7 


7 


7 


101 


329 


8 


7 


7 


7 


7 


102 


338 


8 


8 


8 


8 


8 


103 


341 


8 


8 


8 


8 


8 


104 


342 


8 


7 


7 


7 


7 


105 


343 


8 


7 


7 


7 


7 


106 


345 


8 


7 


7 


7 


7 


107 


351 


8 


8 


8 


8 


8 



Note . The variable labels are: 

LCFCL3 'LCF classification with 3 preds' 

LCFCL4 'LCF classification with 4 preds' 

LDFCL3 'LDF classification with 3 preds' 

LDFCL4 'LDF classification with 4 preds' 

For the present example, for both the 3 and the 4 response 
variable analyses, the LDF and the LCF scores classified all 107 
persons into the same groups. This need not have happened, but 
will happen as the covariance matrices approach equality across 
groups . 

However, in both the LDF and the LCF analyses, 9 persons were 
classified differently across these two analyses; these cases are 
underlined within the table. In both the LDF and the LCF analyses, 
the use of 4 rather than 3 response variables (a) correctly changed 
the predicted classification of 3 people (denoted with plus signs 
in the table) , (b) incorrectly changed the predicted classification 
of 6 people (denoted with minus signs in the table) , thus (c) 
resulting in a net worsening from using more information for 
prediction as regards 3 persons. 
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Table 8 

Incorrect and Correct Statistical Tests 





for Two 
Involving 


Steps of Stepwise Analysis 
k=3 Groups and n=120 People 




Incorrect Step #1 
For k=3, p=l 
df numerator = n - 1 = 
df denominator = n - k = 
lambda = 0.79270 


2 

117 


Correct Step #1 
For k=3, p=50 
df numerator = 
df denominator 


2 p = 

= 2 (n - p - 


100 
2)= 136 


F exact = 


1 - A 


n - k 


F exact = 1 


- A-^ n - 


0-2 


A 


k - 1 






P 




1 -0.79270 


117 




1 -0.79270-* 


136 




0.79270 


2 




0.79270 


100 










1 -0.89034 


136 










0.89034 


100 




0.20730 


58.5 




0.10966 


1.36 




0.79270 






0.89034 






0.26151 


58.5 




0.12317 


1.36 


F exact = 
p calculated = 


15.29841 

.0000012 




F exact = 
p calculated = 


0.16751 

1.00000 




Incorrect Step 
For k=3, p=2 
df numerator = 
df denominator 


#2 

2 (k - 1) = 
=2 (n - k - 


4 

1)= 232 


Correct Step #2 
For k=3, p=50 
df numerator = 
df denominator 


2 p = 

=2 (n - p - 


100 
2)= 136 



lambda = 0.65540 



F exact = 


1 


- A-^ n 


- k - 1 


F exact = 1 - A-^ n - 




- 2 






A^ 


k - 1 


A^ 


P 






1 


-0.65540^ 


232 


1 -0.65540-* 




136 






0.65540 


4 


0.65540 




100 




1 


-0.80957 


232 


1 -0.80957 




136 






0.80957 


4 


0.80957 




100 






0.19043 


58 


0.19043 




1.36 






0.80957 




0.80957 










0.23523 


58 


0.23523 




1.36 


F exact = 




13.64322 




F exact = 0.31991 






p calculated 


= 


.0000945 




p calculated == 1.00000 






Note. The 


formulae for 


degrees 


of freedom and F are presented by 



Tatsuoka (1971, pp. 88-89) . 
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Table 9 

Heuristic Data Illustrating That 
Stepwise Methods Do Not Identify the Best Variable Set 



ID Grp 
1 1 
2 1 

3 1 

4 1 

5 1 

6 1 

7 1 

8 1 

9 1 

10 1 
11 1 
12 1 

13 1 

14 1 

15 1 

16 1 

17 1 

18 1 

19 1 

20 1 
21 1 
22 1 

23 1 

24 1 

25 1 

26 1 

27 1 

28 1 

29 1 

30 1 

31 1 

32 1 

33 1 

34 1 

35 1 

36 1 

37 1 

38 1 

39 1 

40 1 

41 2 

42 2 

43 2 

44 2 

45 2 

46 2 



XI 

30.202 
36.268 
39.381 
32.511 
42.809 
54.841 
32 . 669 
36.884 
49.781 
51.618 
51.375 
55.102 
33.286 
31.384 
50.000 
39.322 
41.290 
48.098 
61.910 
50.028 
34.585 
57.834 
49.760 
26.010 
23.075 
34 . 310 
54.714 
60.945 
44.667 
48.442 
38.796 
47.693 
44.497 
55.224 
50.654 
42 . 632 
50.753 
43.564 
34 . 850 
50.408 
47.213 
34 . 168 
58.639 
38.730 
51.596 
62.621 



X2 

46.146 

44.816 
30.775 
26.201 
39.137 
32.072 
51.460 
45.926 
42.148 
44.373 
43.457 
46.903 
38.660 
41.336 
50.275 
56.273 
47.550 
45.198 
27.474 
51.954 
44.304 
49.899 

29.312 

60.816 
57.059 
44.277 
41.616 
43.890 
52.236 
57.685 
49.830 
43.561 
53.306 
62.785 
26.676 

54.313 
54.410 
42.998 
58.913 
43.214 
37 . 836 
33.221 
27 .033 
49.495 
53.009 
39.735 



X3 

36.393 
46.370 
32.532 
35.776 
40.845 
32 . 474 
55.332 
29.255 
43.681 
41.579 
55.160 
44.780 
39.553 
36.259 
61.363 
55.674 
38.913 
38.960 
38.298 
50.832 
36.311 
49.276 
44.098 
58.574 
48 .307 
34.315 
51.413 
44.886 
53.525 
57.240 
34.957 
28.529 
41.543 
58.527 
40.851 
49.072 
45.739 
39.366 
64.975 
43.598 
44.151 
29.149 
48.206 
48.813 
51.326 
52.727 



X4 

44.268 

42.663 

31.966 

40.843 

47.970 

52.689 

40.989 

44.400 

37.719 

48.125 

35.306 

44.669 

32.117 

44.751 

33.207 

34.216 

63.592 

58.692 

46.657 

44.419 

46.899 

50.643 

61.037 

31.081 

40.710 

52.634 

52.284 

40.360 

51.628 

34.324 

45.241 

52.057 

46.079 

32.167 

30.122 

34.758 
59.575 
51.515 
39.955 
59.859 
50.418 
46.838 
52.029 
48.258 

45.759 
71.905 
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47 


2 


51.737 


37 


48 


2 


43.922 


61 


49 


2 


42.726 


54 


50 


2 


44.939 


48 


51 


2 


42.050 


59 


52 


2 


37.950 


63 


53 


2 


46.938 


56 


54 


2 


59.976 


53 


55 


2 


59.651 


46 


56 


2 


61.465 


36 


57 


2 


51.051 


46 


58 


2 


40.534 


43 


59 


2 


48.756 


53 


60 


2 


69.683 


38 


61 


2 


46.532 


48 


62 


2 


47.390 


33 


63 


2 


45.617 


69 


64 


2 


56.300 


47 


65 


2 


36.826 


69 


66 


2 


55.413 


49 


67 


2 


52.831 


56 


68 


2 


53.087 


46 


69 


2 


47.221 


57 


70 


2 


54.653 


57 


71 


2 


51.779 


65 


72 


2 


46.009 


52 


73 


2 


52.968 


48 


74 


2 


43.296 


45 


75 


2 


55.779 


55 


76 


2 


55.410 


62 


77 


2 


51.454 


57 


78 


2 


48.538 


44 


79 


2 


62.931 


45 


80 


2 


68.626 


47 


81 


3 


40.113 


52 


82 


3 


63.539 


41 


83 


3 


45.115 


61 


84 


3 


36.029 


43 


85 


3 


51.691 


31 


86 


3 


66.255 


59 


87 


3 


54.119 


53 


88 


3 


49.996 


64 


89 


3 


60.048 


59 


90 


3 


46.350 


50 


91 


3 


49.121 


60 


92 


3 


48.088 


68 


93 


3 


52.787 


59 


94 


3 


44.986 


41 


95 


3 


55.269 


68 


96 


3 


50.261 


47 


97 


3 


56.321 


57 


98 


3 


50.766 


49 


99 


3 


65.540 


45 



45.013 


38.552 


55.784 


55.129 


54.281 


37.671 


36.004 


64.368 


61.987 


63.012 


55.519 


35.175 


65.436 


48.823 


51.431 


54.273 


58.262 


48.909 


45.301 


63.513 


51.258 


43.695 


40.944 


50.941 


56.950 


39.971 


49.262 


37.572 


49.324 


62.440 


28.706 


53.079 


56.763 


51.743 


57.178 


51.941 


62.206 


60.214 


48.629 


43.843 


56.712 


45.976 


48.024 


43.155 


52.413 


48.072 


51.724 


48.850 


66.259 


46.466 


48.452 


54.614 


50.156 


50.077 


45.162 


58.516 


59.676 


23.961 


58.090 


48.973 


54.929 


45.531 


49.021 


49.085 


53.116 


54.326 


49.993 


70.532 


50.289 


49.856 


46.398 


59.927 


65.551 


61.702 


38.991 


45.273 


41.387 


55.789 


45.930 


63.253 


57.157 


56.673 


63.878 


61.408 


61.433 


41.806 


59.540 


57.780 


44.200 


69.682 


59.637 


51.042 


61.506 


46.042 


39.170 


43.529 


59.191 


60.153 


44.830 


54.833 


59.734 


51.043 


54.050 


50.134 


58.401 


54.444 


CO 





667 

284 

703 

408 

340 

446 

395 

046 

707 

292 

853 

357 

468 

471 

917 

825 

776 

684 

819 

488 

210 

471 

142 

012 

569 

845 

023 

937 

454 

863 

612 

353 

867 

541 

329 

711 

546 

581 

516 

021 

613 

174 

992 

215 

275 

394 

393 

866 

Oil 

608 

470 

361 

512 
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100 


3 


47.305 


63 . 


101 


3 


61.232 


52 . 


102 


3 


43.688 


54 . 


103 


3 


74.301 


49 . 


104 


3 


46.216 


55 . 


105 


3 


50.882 


46 . 


106 


3 


48.898 


58 . 


107 


3 


60.911 


60 . 


108 


3 


60.918 


49 . 


109 


3 


49.932 


65 . 


110 


3 


55.415 


61 . 


111 


3 


66.505 


36 . 


112 


3 


59.574 


52 . 


113 


3 


62.806 


42 . 


114 


3 


55.761 


68 . 


115 


3 


73.150 


46 . 


116 


3 


56.814 


60 . 


117 


3 


50.092 


65 . 


118 


3 


65.086 


58 . 


119 


3 


57.997 


66 . 


120 


3 


73.867 


46 . 



55.889 


44.630 


59.623 


49.975 


54.662 


44.419 


45.461 


64.624 


43.794 


70.389 


42.779 


48.925 


56.452 


60.881 


62.039 


62.825 


43.208 


48.960 


79.812 


53.265 


64.733 


49.648 


41.958 


60.718 


63.181 


60.637 


51.890 


57.537 


60.399 


52.615 


38.224 


77.559 


64.211 


40.352 


44.826 


54.327 


62.482 


48.116 


58.486 


63.017 


70.118 


61.087 



725 

462 

287 

445 

Oil 

326 

229 

077 

582 

463 

860 

375 

291 

934 

426 

255 

450 

513 

518 

886 

347 
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Table 10 

Heuristic Results Illustrating That 
Stepwise Methods Do Not Identify the Best Variable Set: 

The DDA Stepvise Results 



AT STEP 1, XI 

WILKS' LAMBDA 
EQUIVALENT F 

******** 
AT STEP 2, X2 

WILKS' LAMBDA 
EQUIVALENT F 



WAS INCLUDED IN THE ANALYSIS. 





DEGREES OF 


FREEDOM 


SIGNIF, 


0.79270 


1 2 


117.0 




15.2988 


2 


117.0 


0.0000 


******* 


****** 


* * * * 


* * * * 


WAS INCLUDED 


IN THE ANALYSIS. 






DEGREES OF 


FREEDOM 


SIGNIF 


0.65540 


2 2 


117.0 




13.6432 


4 


232.0 


0.0000 



BETWEEN GROUPS 



BETWEEN GROUPS 



FUNCTION EIGENVALUE 



CANONICAL DISCRIMINANT FUNCTIONS 



PERCENT OF 
VARIANCE 



CUMULATIVE 

PERCENT 



CANONICAL 

CORRELATION 



AFTER 

FUNCTION 



WILKS' LAMBDA CHI -SQUARED D.F. SIGNIFICANCE 



1* 


0.52461 


99.85 


99.85 


0.5865949 : 


0 

1 


0.6553991 

0.9992265 


49.223 

0.90148E-01 


2* 


0.00077 


0.15 


100.00 


0.0278119 : 









0.0000 

0.7640 



Note > These results were extracted from the output created by applying 
the Appendix C program to the Table 9 heuristic data. 




( J 



Pantheon of Faux Pas -78- 

Tables 



Table 11 

Heuristic Results Illustrating That 
Stepwise Methods Do Not Identify the Best Variable Set: 
The DDA All-Possible-Subsets Results 



XI. X2 
DDA 




WILKS' LAMBDA 


CHI -SQUARED 


D.F. 




SIGNIFICANCE 






0.6553991 


49.223 


4 




0 . 0000 






0.9992265 


0.90148E-01 


1 




0.7640 


1-Way 


MANOVA 


Wilks L. 


F 


Hypoth . 


DF 


Error DF Sig. of F 




. 65540 


13.64322 


4.00 




232.00 .000 






.99923 


.09057 


1.00 




117.00 .764 


XI. X3 
DDA 




WILKS' LAMBDA 


CHI-SQUARED 


D.F. 




SIGNIFICANCE 






0.6961866 


42.189 


4 




0.0000 






0.9988321 


0.13614 


1 




0.7122 


1-Way 


MANOVA 


Wilks L. 


F 


Hypoth . 


DF 


Error DF Sig. of F 




. 69619 


11.51286 


4.00 




232.00 .000 






.99883 


.13680 


1.00 




117.00 .712 



XI. X4 



DDA 




WILKS' LAMBDA 


CHI -SQUARED 


D.F. 




SIGNIFICANCE 






0.7081264 


40.208 


4 




0.0000 






0.9991168 


0.10294 


1 




0.7483 


1-Way 


MANOVA 


Wilks L. 


F 


Hypoth . 


DF 


Error DF Sig. of F 




.70813 


10.92434 


4.00 




232.00 .000 






.99912 


.10343 


1.00 




117.00 .748 


X2.X3 

DDA 




WILKS' LAMBDA 


CHI-SQUARED 


D.F. 




SIGNIFICANCE 






0.8094569 


24.627 


4 




0.0001 






0.9913438 


1.0128 


1 




0.3142 


1-Way 


MANOVA 


Wilks L. 


F 


Hypoth . 


DF 


Error DF Sig. of F 




. 80946 


6.46606 


4.00 




232.00 .000 






.99134 


1.02162 


1.00 




117.00 .314 


X2.X4 

DDA 




WILKS' LAMBDA 


CHI -SQUARED 


D.F. 




SIGNIFICANCE 






0.6966245 


42.116 


4 




0.0000 






0.9999445 


0.64643E-02 


1 1 




0.9359 


1-Way 


MANOVA 


Wilks L. 


F 


Hypoth . 


DF 


Error DF Sig. of F 




. 69662 


11.49101 


4.00 




232.00 .000 






. 99994 


.00649 


1.00 




117.00 .936 


X3 .X4 
DDA 




WILKS' LAMBDA 


CHI -SQUARED 


D.F. 




SIGNIFICANCE 






0.6272538 


54.336 


4 




0.0000 






0.9973925 


0.30417 


1 




0.5813 




79 



1-Way MANOVA Wilks L. 

.62725 

.99739 



Pantheon of Faux Pas -79- 

Tables 



F Hypoth. DF Error DF Sig. of F 

15.23292 4.00 232.00 .000 

.30588 1.00 117.00 .581 



Note . In addition to illustrating that the stepwise selection of variables 
XI and X2 as the first two variables is incorrect, since the lambda value 
for X3 and X4 is better (.62725 vs . 65540 ), the tabled results also 
illustrate that DDA and a one-way MANOVA are the same analysis, even though 
the SPSS programmers made inconsistent choices of test statistics and the 
number of decimals to report across these two analyses. 
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Table 12 

Heuristic Data Illustrating 
the Context Specificity of GLM Weights 



ID/ Response Variable* 



Stat . 


Grp 


XI 


X2 


X3 


X4 


1 


1 


4 


3 


7 


19 


2 


1 


4 


4 


4 


17 


3 


1 


3 


5 


3 


17 


4 


1 


2 


6 


4 


19 


5 


1 


2 


7 


7 


17 


6 


1 


4 


8 


12 


12 


7 


1 


3 


5 


7 


12 


8 


2 


5 


1 


6 


12 


9 


2 


5 


2 


3 


10 


10 


2 


4 


3 


2 


10 


11 


2 


3 


4 


3 


12 


12 


2 


3 


5 


6 


10 


13 


2 


5 


6 


11 


5 


14 


2 


4 


3 


6 


5 


15 


3 


6 


2 


5 


7 


16 


3 


6 


3 


2 


5 


17 


3 


5 


4 


1 


5 


18 


3 


4 


5 


2 


7 


19 


3 


4 


6 


5 


5 


20 


3 


6 


7 


10 


0 


21 


3 


5 


4 


5 


0 


M, 




3.143 


5.429 


6.286 


16.143 


M2 




4.143 


3.429 


5.286 


9.143 


Mj 




5.143 


4.429 


4.286 


4.143 


SD, 




0.899 


1.718 


3.039 


2.968' 


SDj 




0.899 


1.718 


3.039 


2.968 


SDj 




0.899 


1.718 


3.039 


2.968 


Covariance 

XI 

X2 

X3 


matrix for group 1 (n= 
.8095 

.5714 2.9524 

.9524 2.8571 9.2381 


=7) 


X4 




.6905 


-2.4048 


-5.8810 


8.8095' 


Covariance 

XI 

X2 

X3 


matrix for group 2 (n= 
.8095 

.5714 2.9524 

.9524 2.8571 9.2381 


=7) 


X4 




.6905 


-2.4048 


-5.8810 


8.8095 
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Covariance matrix for group 3 (n=7) 



XI 


.8095 








X2 


-.5714 


2.9524 






X3 


.9524 


2.8571 


9.2381 




X4 


-.6905 


-2.4048 


-5.8810 


8.8095 


Pooled 


within-groups covariance 


matrix (n=21)' 


XI 


.8095 








X2 


-.5714 


2.9524 






X3 


.9524 


2.8571 


9.2381 




X4 


-.6905 


-2.4048 


-5.8810 


8.8095 



aera9801.wkl 3/20/98 

*The ’’response variables” in a discriminant analysis are the 
intervally-scaled variables. In a DBA the response variables are 
the intervally-scaled criterion variables being predicted by group 
membership data. In a PDA the response variables are the 
intervally-scaled predictor variables predicting group membership. 

•"The variance on the diagonal of the variance/ covariance matrix is 
the square of the SD of the variable (e.g., 2.968^ = 8.8095), and 
the SD of the variable is the square root of the variance of the 
variable (e.g., 8.8095* = 2.968). 

'Because here the group sizes are equal and the variance-covariance 
matrices computed seperately ’’within” each group are also exactly 
equal (staticians call this ’’homogeneity” of the covariance 
matrices — it sounds more sophisticated than simply [clearly] saying 
these matrices are equal) , the weighted average of the covariance 
matrices (called the ’’pooled” covariance matrix) also equals each 
of the three separate group covariance matrices. 
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Table 13 

Heuristic Results Illustrating 
the Context Specificity of GLM Weights 



Weiahts 


in the Context 


of 3 Response 


Variables 


Standardized canonical 


discriminant 


function coefficients 




Func 1 


Func 2 




XI 


1.50086 


-.01817 




X2 


1.25012 


1.16078 




X3 


-1.37261 


-.44995 




Structure matrix 








Func 1 


Func 2 




XI 


.56076 


-.60392* 




X2 


-.05557 


.92134* 




X3 


-.16600 


.17877* 




Weiahts 


in the Context 


of 4 Response 


Variables 



Standardized canonical discriminant function coefficients 





Func 1 


Func 2 


XI 


-.47343 


1.22249 


X2 


-.12685 


1.77579 


X3 


1.09588 


-1.04760 


X4 


1.16456 


.56180 


Structure 


matrix 






Func 1 


Func 2 


XI 


-.34600* 


.05602 


X2 


.09855 


.48590 


X3 


.10242* 


-.01658 


X4 


.63238* 


.09130 



ERiC 
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Table 14 

Heuristic Data Set #1 Illustrating 
the Importance of Both Function and Structure Coefficients 



ID/ Response Variable* 



Stat . 


Grp 


XI 


X2 


X3 


1 


1 


0 


13 


13 


2 


1 


4 


6 


18 


3 


1 


2 


9 


33 


4 


1 


6 


4 


8 


5 


1 


8 


3 


13 


6 


1 


10 


3 


25 


7 


1 


12 


4 


30 


8 


1 


14 


6 


20 


9 


1 


18 


13 


25 


10 


1 


16 


9 


5 


11 


2 


1 


14 


9 


12 


2 


5 


7 


14 


13 


2 


3 


10 


29 


14 


2 


7 


5 


4 


15 


2 


9 


4 


9 


16 


2 


11 


4 


21 


17 


2 


13 


5 


26 


18 


2 


15 


7 


16 


19 


2 


19 


14 


21 


20 


2 


17 


10 


1 


21 


3 


3 


11 


10 


22 


3 


7 


4 


15 


23 


3 


5 


7 


30 


24 


3 


9 


2 


5 


25 


3 


11 


1 


10 


26 


3 


13 


1 


22 


27 


3 


15 


2 


27 


28 


3 


17 


4 


17 


29 


3 


21 


11 


22 


30 


3 


19 


7 


2 


M, 

Mj 




9.000 


7.000 


19.000 




10.000 


8.000 


15.000 


Mj 




12.000 


5.000 


16.000 


SD, 

SDj 




5.745 


3.633 


8.832 




5.745 


3.633 


8.832 


SDj 




5.745 


3 . 633 


8.832 



aera9803.wkl 3/21/98 

•The "response variables" in a discriminant analysis are the 
intervally-scaled variables. In a DDA the response variables are 
the intervally-scaled criterion variables being predicted by group 
membership data. In a PDA the response variables are the 
intervally-scaled predictor variables predicting group membership. 
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Table 15 

Heuristic Results #2 Illustrating 
the Importance of Both Function and Structure Coefficients 



STANDARDIZED CANONICAL DISCRIMINANT FUNCTION COEFFICIENTS 





FUNC 1 


FUNC 2 


XI 


-0.50132 


-0.42337 


X2 


0.86161 


-0.32427 


X3 


0.07938 


0.84594 



STRUCTURE 


MATRIX 






FUNC 1 


FUNC 2 


XI 


-0.50132* 


-0.42337 


X2 


0.86161* 


-0.32427 


X3 


0.07938 


0.84594 
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Table 16 

Heuristic Data Set #2 Illustrating 
the Importance of Both Function and Structure Coefficients 



ID/ Response Variable 

ID Grp XI X3 



1 1 
2 1 

3 1 

4 1 

5 1 

6 1 

7 1 

8 1 

9 1 

10 1 
11 1 
12 1 

13 1 

14 1 

15 1 

16 1 

17 1 

18 1 

19 1 

20 1 
21 1 
22 1 

23 1 

24 1 

25 1 

26 1 

27 1 

28 1 

29 1 

30 1 

31 1 

32 1 

33 1 

34 1 

35 1 

36 1 

37 1 

38 1 

39 1 

40 1 

41 2 

42 2 

43 2 

44 2 

45 2 



29.504 

35.377 

38.646 

32.166 

42.123 
53.744 
32.359 
36.474 
48.948 
50.738 
50.535 
54.179 
33.117 
31.286 
49.303 

39.003 
40.929 
47.503 
60.888 
49.430 
34.541 

57.003 
49.220 

26.350 
23.518 
34.368 
54.078 

60.099 
44.404 
48.057 
38.759 

47.351 
44.271 
54.653 
50.281 
42.553 
50.407 
43.467 
35.070 

50.100 
47.031 
34.440 

58.123 
38.938 
51.361 



42.923 

40.427 

30.333 

29.527 

37.132 

28.508 

49.590 

44.465 

38.320 

39.708 

39.256 

41.307 

40.453 

43.078 

45.602 

53.124 

45.935 

42.345 

25.192 

47.577 

45.680 

44.198 

30.174 

61.440 

59.255 
46.340 
39.067 
39.284 
50.167 
53.530 
49.947 
42.744 

51.256 
56.111 
29.187 
53.080 
51.156 
44.039 
58.895 
42.617 
39.315 
39.074 
28.294 
51.302 
50.766 



29.576 
37.666 
29.319 
29.132 
37.234 
35.073 
44.558 
29.162 
41.963 

41.576 
49.973 
45.286 
33.975 
31.644 
54.567 
48.315 
37.544 
39.500 
41.614 
48.575 

33.675 
50.022 
41.746 
47.090 
39.290 

32.676 
49.668 
47.905 

49.101 
53.324 
35.416 
33.542 
41.813 
57.113 
40.420 
46.333 
46.945 
39.270 
54.395 
44.246 
42.967 
28.846 

48.125 
44.884 
51.048 




8S 



46 


2 


62 . 


47 


2 


51 , 


48 


2 


44 . 


49 


2 


42 . 


50 


2 


45 , 


51 


2 


42 . 


52 


2 


38 . 


53 


2 


46 . 


54 


2 


59 , 


55 


2 


59 , 


56 


2 


61 , 


57 


2 


50 


58 


2 


40 


59 


2 


48 


60 


2 


69 


61 


2 


46 


62 


2 


47 


63 


2 


45 


64 


2 


56 


65 


2 


37 


66 


2 


55 


67 


2 


52 


68 


2 


53 


69 


2 


47 


70 


2 


54 


71 


2 


51 


72 


2 


46 


73 


2 


53 


74 


2 


43 


75 


2 


55 


76 


2 


55 


77 


2 


51 


78 


2 


48 


79 


2 


62 


80 


2 


68 


81 


3 


40 


82 


3 


63 


83 


3 


45 


84 


3 


36 


85 


3 


51 


86 


3 


66 


87 


3 


54 


88 


3 


50 


89 


3 


60 


90 


3 


46 


91 


3 


49 


92 


3 


48 


93 


3 


53 


94 


3 


45 


95 


3 


55 


96 


3 


50 


97 


3 


56 


98 


3 


51 
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37.532 


53.917 


38.750 


45.354 


59.607 


52.539 


54.788 


50.448 


49.405 


39.081 


58.822 


55.714 


63.250 


50.925 


55.456 


59.208 


49.553 


54.338 


44.703 


57.759 


36.092 


49.032 


47.080 


50.649 


47.054 


40.398 


53.000 


54.324 


36.061 


54 . 631 


50.368 


48.506 


38.375 


34.157 


67.118 


55.424 


47.024 


56.497 


69.471 


56.009 


48.748 


51.137 


54.825 


56.223 


47.373 


49.864 


57.273 


51.987 


55.468 


54.064 


63.035 


63.285 


54.534 


48.788 


49.063 


51.585 


49.927 


45.095 


54.218 


59.348 


60.159 


59.143 


57.054 


55.261 


47.487 


49.094 


45.031 


56.524 


44.920 


56.643 


56.076 


48.200 


41.788 


52.130 


62.146 


60.511 


50.628 


38.987 


37.143 


44.118 


55.171 


55.063 


54.081 


57.507 


63.482 


61.656 


57.666 


62.915 


53.651 


56.202 


60.881 


48.760 


67.626 


59.086 


59.407 


60.727 


47.690 


42.188 


65.603 


61.156 


50.874 


48.134 


57.215 


60.662 


52.479 


54.457 
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oil 

508 

008 

859 

016 

240 

303 

990 

590 

283 

041 

987 

836 

792 

031 

701 

542 

837 

153 

357 

308 

835 

123 

460 

667 

908 

339 

074 

738 

794 

442 

622 

822 

728 

232 

712 

342 

558 

825 

950 

036 

325 

350 

064 

860 

541 

559 

106 

581 

511 

678 

552 

219 
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99 


3 


65.509 


45.815 


61 


100 


3 


47.918 


64.920 


56 


101 


3 


61.419 


52.850 


62 


102 


3 


44.501 


58.930 


53 


103 


3 


74.096 


47.502 


57 


104 


3 


46.979 


59.091 


47 


105 


3 


51.521 


51.319 


47 


106 


3 


49.607 


61.195 


57 


107 


3 


61.220 


59.648 


64 


108 


3 


61.227 


51.405 


51 


109 


3 


50.665 


67.000 


73 


110 


3 


55.987 


62.945 


65 


111 


3 


66.710 


40.179 


51 


112 


3 


60.045 


54.641 


64 


113 


3 


63.249 


47.013 


57 


114 


3 


56.559 


69.593 


64 


115 


3 


73.365 


47.814 


53 


116 


3 


57.605 


63.250 


66 


117 


3 


51.241 


69.792 


52 


118 


3 


65.732 


60.535 


67 


119 


3 


58.946 


69.314 


64 


120 


3 


74.402 


49.980 


74 




583 

355 

042 

558 

292 

819 

794 

117 

766 

763 

107 

147 

533 

535 

722 

217 

417 

086 

887 

970 

397 

818 
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Table 17 

Heuristic Results #2 Illustrating 
the Importance of Both Function and Structure Coefficients 



STANDARDIZED CANONICAL DISCRIMINANT FUNCTION COEFFICIENTS 





FUNC 1 


FUNC 2 


XI 


0.93660 


1.07729 


X2 


0.95259 


1.43338 


X3 


- 0.05507 


-1.70996 


STRUCTURE 


MATRIX 






FUNC 1 


FUNC 2 


XI 


0.54141* 


-0.28008 


X2 


0.56453* 


0.24316 


X3 


0 . 81431 * 


-0.55744 
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Table 18 

Heuristic Data Set #3 Illustrating 
the Importance of Both Function and Structure Coefficients 



ID/ Response Variable 



ID 


Grp 


XI 


X2 


X3 


1 


1 


31.107 


41.920 


44.130 


2 


1 


37.386 


43.111 


55.702 


3 


1 


40.301 


29.292 


40.991 


4 


1 


32.981 


21.197 


33.741 


5 


1 


43.659 


38.767 


49.266 


6 


1 


56.148 


36.705 


51.915 


7 


1 


33.099 


46.916 


53.419 


8 


1 


37.419 


42.930 


36.720 


9 


1 


50.786 


44.536 


56.564 


10 


1 


52.673 


47.540 


56.989 


11 


1 


52.384 


46.404 


65.522 


12 


1 


56.199 


51.374 


61.670 


13 


1 


33.543 


33 . 607 


37.093 


14 


1 


31.563 


35.508 


33.774 


15 


1 


50.840 


52.470 


68.917 


16 


1 


39.741 


53.984 


57.161 


17 


1 


41.753 


45.846 


44.518 


18 


1 


48.821 


46.360 


49.784 


19 


1 


63 . 104 


34.040 


55.862 


20 


1 


50.748 


53.881 


60.467 


21 


1 


34.687 


39.388 


34.741 


22 


1 


58.809 


55.013 


64.993 


23 


1 


50.412 


30.512 


48.941 


24 


1 


25.678 


52.286 


45.763 


25 


1 


22.628 


47.189 


35.085 


26 


1 


34.290 


38.953 


31.661 


27 


1 


55.465 


44.953 


60.297 


28 


1 


61.929 


49.927 


61.620 


29 


1 


45.002 


51.401 


55.457 


30 


1 


48.912 


58.554 


62.578 


31 


1 


38.876 


46.354 


36.464 


32 


1 


48.112 


43.765 


37.997 


33 


1 


44.788 


52.322 


46.752 


34 


1 


55.896 


66.513 


69.875 


35 


1 


51.101 


27.581 


43.654 


36 


1 


42.750 


52.288 


49.343 


37 


1 


51.167 


55.825 


53.826 


38 


1 


43.702 


41.100 


40.248 


39 


1 


34.634 


53.552 


54.444 


40 


1 


50.778 


44.174 


48.748 


41 


2 


47.442 


37.269 


44.752 


42 


2 


33.891 


26.956 


21.808 


43 


2 


59.232 


30.951 


53.661 


44 


2 


38.516 


45.224 


42.184 


45 


2 


51.874 


54.318 


55.888 



3 0 



46 


2 


63 , 


47 


2 


52 


48 


2 


43 


49 


2 


42 


50 


2 


44 . 


51 


2 
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Table 19 

Heuristic Results #3 Illustrating 
the Importance of Both Function and Structure Coefficients 



STANDARDIZED CANONICAL DISCRIMINANT FUNCTION COEFFICIENTS 





FUNC 1 


FUNC 2 


XI 


1.22956 


0.28470 


X2 


1.21174 


-0.20978 


X3 


- 1.58393 


0.89694 


STRUCTURE 


MATRIX 






FUNC 1 


FUNC 2 


XI 


0.39129 


0.82637* 


X2 


0.38294 


0.39748* 


X3 


- 0.03464 


0.94557* 
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Figure 1 

PDA Territorial Maps for the Table 5 Heuristic Data 
Illustrating That More Predictors 
May Actually Hurt Classification Accuracy 
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Predictor/Response Variables 
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Note . Although the DDA effect size always stays the same or gets 
better (i.e., smaller) as more response variables are added (for 
these data, X 3 = 0.8094909 while X 4 = 0.8050684), the PDA hit rate 
can get worse as response variables are added. 
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APPENDIX A 

SPSS/LISREL Program Illustraing That 
SEM is the Most General Case of the General Linear Model 
Using the Holzinger and Swineford (1939) Data 



TITLE 'CANLISRL.SPS Holzinger & Swineford (1939) Data 

COMMENT ********************************************************************* 
COMMENT Holzinger, K.J., & Swineford, F* (1939). A study in factor analysis:. 
COMMENT The stability of a bi-factor solution (No. 48). Chicago, IL: . 

COMMENT University of Chicago, (data on pp. 81-91). 

COMMENT ********************************************************************, 

SET BLANKS=SYSMIS UNDEFINED=WARN. 

DATA LIST 

FILE=abc FIXED RECORDS=2 TABLE 
/I id 1-3 sex 4-4 ageyr 6-7 

agemo 8-9 tl 11-12 t2 14-15 t3 17-18 t4 20-21 t5 23-24 t6 26-27 t7 29-30 t8 
32-33 t9 35-36 tlO 38-40 til 42-44 tl2 46-48 tl3 50-52 tl4 54-56 tl5 58-60 
tl6 62-64 tl7 66-67 tl8 69-70 tl9 72-73 t20 74-76 t21 78-79 /2 t22 11-12 
t23 14-15 t24 17-18 t25 20-21 t26 23-24 . 

EXECUTE . 

COMPUTE SCHOOL=l. 

IF (ID GT 200)SCHOOL=2. 

IF (ID GE 1 AND ID LE 85)GRADE=7. 

IF (ID GE 86 AND ID LE 168)GRADE=8. 

IF (ID GE 201 AND ID LE 281)GRADE=7. 

IF (ID GE 282 AND ID LE 351)GRADE=8. 

IF (ID GE 1 AND ID LE 44)TRACK=2. 

IF (ID GE 45 AND ID LE 85)TRACK=1. 

IF (ID GE 86 AND ID LE 129)TRACK=2. 

IF (ID GE 130)TRACK=1. 

PRINT FORMATS SCHOOL TO TRACK ( FI. 0). 

VALUE LABELS SCHOOL ( 1 ) PASTEUR (2) GRANT-WHITE/ 

TRACK (l)JUNE PROMOTIONS (2) FEB PROMOTIONS/. 

VARIABLE LABELS Tl VISUAL PERCEPTION TEST FROM SPEARMAN VPT, PART III 
T2 CUBES, SIMPLIFICATION OF BRIGHAM'S SPATIAL RELATIONS TEST 
T3 PAPER FORM BOARD — SHAPES THAT CAN BE COMBINED TO FORM A TARGET 
T4 LOZENGES FROM THORNDIKE — SHAPES FLIPPED OVER THEN IDENTIFY TARGET 
T5 GENERAL INFORMATION VERBAL TEST 
T6 PARAGRAPH COMPREHENSION TEST 
T7 SENTENCE COMPLETION TEST 

T8 WORD CLASSIFICATION — WHICH WORD NOT BELONG IN SET 

T9 WORD MEANING TEST 

TlO SPEEDED ADDITION TEST 

Til SPEEDED CODE TEST — TRANSFORM SHAPES INTO ALPHA WITH CODE 

T12 SPEEDED COUNTING OF DOTS IN SHAPE 

T13 SPEEDED DISCRIM STRAIGHT AND CURVED CAPS 

T14 MEMORY OF TARGET WORDS 

T15 MEMORY OF TARGET NUMBERS 

T16 MEMORY OF TARGET SHAPES 

T17 MEMORY OF OBJECT-NUMBER ASSOCIATION TARGETS 

T18 MEMORY OF NUMBER-OBJECT ASSOCIATION TARGETS 

T19 MEMORY OF FIGURE-WORD ASSOCIATION TARGETS 

T20 DEDUCTIVE MATH ABILITY 

T21 MATH NUMBER PUZZLES 

T22 MATH WORD PROBLEM REASONING 

T23 COMPLETION OF A MATH NUMBER SERIES 

T24 WOODY-MCCALL MIXED MATH FUNDAMENTALS TEST 

T25 REVISION OF T3 — PAPER FORM BOARD 

T26 FLAGS — POSSIBLE SUBSTITUTE FOR T4 LOZENGES. 

SUBTITLE 'CCA ##############'. 

correlations variables=t6 t7 t2 t4 t20 t21 t22/ 
atatiatica=all . 

manova t6 t7 with t2 t4 t20 t21 t22/ 
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print=aignif (multiv eigen dimenr)/ 
discrim=stan cor alpha (. 999 ) /design . 

SUBTITLE 'Function I 2nd Variate n=301 v=7'. 
execute . 

PRELIS 

/VARIABLES 

t2 (CO) t4 (CO) t20 (CO) t21 (CO) t22 (CO) 
t6 (CO) t7 (CO) 

/TYPE=CORRELATION 
/MATRIX=OUT ( CRl ) 

LISREL 

/"lb First Function n=301 v=7" 

/DA NI=7 NO=301 MA=KM 
/MATRIX=IN(CR1) 

/MO BE=ZE PS=ZE TD=ZE LX=ID LY=FU,FI TE=SY,FR 
GA=FU,FI PH=SY,FR NX=2 NY=5 NK=2 NE=1 
/VA 1.0 PH (1,1) PH (2, 2) 

/VA 1.0 LY(1,1) 

/FR LY(2,1) LY(3,1) LY(4,1) LY(5,1) 

/FR GA(1,1) GA(1,2) 

/OU SS FS SL=1 TM=1200 ND=5 
SUBTITLE 'Function I 1st Variate n=301 v=7'. 
execute . 

PRELIS 

/VARIABLES 
t6 (CO) t7 (CO) 

t2 (CO) t4 (CO) t20 (CO) t21 (CO) t22 (CO) 

/TYPE=CORRELATION 

/MATRIX=OUT ( CR2 ) 

LISREL 

/"la First Function n=301 v=7" 

/DA NI=7 NO=301 MA=KM 
/MATRIX=IN(CR2) 

/MO BE=ZE PS=ZE TD=ZE LX=ID LY=FU,FI TE=SY,FR 
GA=FU,FI PH=SY,FR NX=5 NY=2 NK=5 NE=1 
/VA 1.0 PH(1,1) PH(2,2) PH(3,3) PH(4,4) PH(5,5) 
/VA 1.0 LY(1,1) 

/FR LY(2,1) 

/FR GA(1,1) GA(1,2) GA(1,3) GA(1,4) GA(1,5) 

/OU SS FS SL=1 TM=1200 ND=5 

SUBTITLE 'Function II 2nd Variate n=301 v=7 ' . 
execute . 

LISREL 

/"2b Second Function n=301 v=7" 

/DA NI=7 NO=301 MA=KM 
/MATRIX=IN(CR1) 

/MO BE=ZE PS=ZE TD=ZE LX=ID LY=FU,FI TE=SY,FR 
GA=FU,FI PH=SY,FR NX=2 NY=5 NK=2 NE=2 
/VA 1.0 PH (1,1) PH (2, 2) 

/VA 1.0 LY(1,1) LY(1,2) 

/VA 0.76757 LY(2,1) 

/VA 2.34225 LY(3,1) 

/VA 2.13559 LY(4,1) 

/VA 3.17417 LY(5,1) 

/FR LY(2,2) LY(3,2) LY(4,2) LY(5,2) 

/VA 0.06992 GA(1,1) 

/VA 0.09682 GA(1,2) 

/FR GA(2,1) GA(2,2) 

/OU SS FS SL=1 TM=1200 ND=5 
SUBTITLE 'Function II 1st Variate n=301 v=7'. 
execute . 

LISREL 

/"2a Second Function n=301 v=7" 

/DA NI=7 NO=301 MA=KM 




98 



Pantheon of Faux Pas -96- 

Appendices 



/MATRIX=IN(CR2 ) 

/MO BE=ZE PS=ZE TD=ZE LX=ID LY=FU,FI TE=SY,FR 
GA=FU,FI PH=SY,FR NX=5 NY=2 NK=5 NE=2 
/VA 1.0 PH(1,1) PH(2,2) PH(3,3) PH(4,4) PH(5,5) 
/VA 1.0 LY(1,1) LY(1,2) 

/VA 1.05093 LY(2,1) 

/FR LY(2,2) 

/VA -.00729 GA(1,1) 

/VA -.09934 GA(1,2) 

/VA 0.16926 GA(1,3) 

/VA 0.13288 GA(1,4) 

/VA 0.36285 GA(1,5) 

/FR GA(2,1) GA(2,2) GA(2,3) GA(2,4) GA(2,5) 

/OU SS FS SL=1 TM=1200 ND=5 
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APPENDIX B 

SPSS Program for the Table 5 Actual Data Illustrating That 
More Predictors May Actually Hurt Classification Accuracy 



TITLE 'AERA9804.SPS Holzinger & Swineford (1939) Data ***** 

PnMM1<!NT* **************************************************************** ****** 

COMMENT Holzinger, K.J., & Swineford, F. (1939). A study in factor analysis: 
COMMENT The Stability of a bi-factor solution (No. 48) . Chicago, IL: 

COMMENT University of Chicago, (data on pp. 81-91) 

COMMENT* ** * ******************************************************************* 
DATA LIST FILE=BT RECORDS=2 

/I ID 1-3 SEX 4 AGEYR 6-7 AGEMO 8-9 

T1 11-12 T2 14-15 T3 17-18 T4 20-21 T5 23-24 T6 26-27 
T7 29-30 T8 32-33 T9 35-36 TIO 38-40 Til 42-44 T12 46-48 
T13 50-52 T14 54-56 T15 58-60 T16 62-64 T17 66-67 
T18 69-70 T19 72-73 T20 74-76 T21 78-79 
/2 T22 11-12 T23 14-15 T24 17-18 
T25 20-21 T26 23-24 
COMPUTE SCHOOL=l 
IF (ID GT 200)SCHOOL=2 
IF (ID GE 1 AND ID LE 85)GRADE=7 
IF (ID GE 86 AND ID LE 168)GRADE=8 
IF (ID GE 201 AND ID LE 281)GRADE=7 
IF (ID GE 282 AND ID LE 351)GRADE=8 
IF (ID GE 1 AND ID LE 44)TRACK=2 
IF (ID GE 45 AND ID LE 85)TRACK=1 
IF (ID GE 86 AND ID LE 129)TRACK=2 
IF (ID GE 130)TRACK=1 
PRINT FORMATS SCHOOL TO TRACK (Fl.O) 

VALUE LABELS SCHOOL ( 1 ) PASTEUR (2) GRANT-WHITE/ 

TRACK (l)JUNE PROMOTIONS (2)FEB PROMOTIONS/ 

VARIABLE LABELS T1 VISUAL PERCEPTION TEST FROM SPEARMAN VPT, PART III 
T2 CUBES, SIMPLIFICATION OF BRIGHAM'S SPATIAL RELATIONS TEST 
T3 PAPER FORM BOARD — SHAPES THAT CAN BE COMBINED TO FORM A TARGET 
T4 LOZENGES FROM THORNDIKE — SHAPES FLIPPED OVER THEN IDENTIFY TARGET 
T5 GENERAL INFORMATION VERBAL TEST 
T6 PARAGRAPH COMPREHENSION TEST 
T7 SENTENCE COMPLETION TEST 

T8 WORD CLASSIFICATION — WHICH WORD NOT BELONG IN SET 

T9 WORD MEANING TEST 

TIO SPEEDED ADDITION TEST 

Til SPEEDED CODE TEST — TRANSFORM SHAPES INTO ALPHA WITH CODE 

T12 SPEEDED COUNTING OF DOTS IN SHAPE 

T13 SPEEDED DISCRIM STRAIGHT AND CURVED CAPS 

T14 MEMORY OF TARGET WORDS 

T15 MEMORY OF TARGET NUMBERS 

T16 MEMORY OF TARGET SHAPES 

T17 MEMORY OF OBJECT-NUMBER ASSOCIATION TARGETS 

T18 MEMORY OF NUMBER-OBJECT ASSOCIATION TARGETS 

T19 MEMORY OF FIGURE-WORD ASSOCIATION TARGETS 
T20 DEDUCTIVE MATH ABILITY 
T21 MATH NUMBER PUZZLES 
T22 MATH WORD PROBLEM REASONING 
T23 COMPLETION OF A MATH NUMBER SERIES 
T24 WOODY-MCCALL MIXED MATH FUNDAMENTALS TEST 
T25 REVISION OF T3 — PAPER FORM BOARD 
T26 FLAGS — POSSIBLE SUBSTITUTE FOR T4 LOZENGES 
subtitle '0 PDA with 3 Predictor Variables **n=301' 
discriminant groups=grade(7,8) / 

variables=tl3 tl7 t22/analysis=tl3 tl7 t22/ 
method=direct/priors=equal/save=scores( discrim) / 
classify=pooled/ 

statistics=mean stddev gcov tcov corr boxm coef table/ 
plot=all 
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select if (discriml It -1.5 or discriml gt 1.5 
or (discriml gt -.3 and discriml It .3)) 
sort cases by grade id 

list variables^id grade tl3 tl7 t22 tl6/ 
cases=999/format=numbered 

subtitle '1 PDA with 3 Predictor Variables **n=107' 
discriminant groups=grade(7,8)/ 

variables=tl3 tl7 t22/analysis=tl3 tl7 t22/ 
method=direct/priors=eq[ual/save=class (LDFCL3 ) / 
classify=pooled/ 

statistics=mean stddev gcov tcov corr boxm coef table/ 
plot=all 

subtitle '2 PDA with 4 Predictor Variables **n=107' 
discriminant groups=grade(7,8)/ 

variables=tl3 tl7 t22 tl6/analysis=tl3 tl7 t22 tl6/ 
method=direct/priors=eq[ual/save=class (LDFCL4 ) / 
classify=pooled/ 

statistics=mean stddev gcov tcov corr boxm coef table/ 
plot=all 

subtitle '3 Compare the 4 Sets of Classification Resultsll' 
compute lcf31=(T13 * 0.1091137) + (T17 * -0.06245298) 

+ (T22 * 0.1659288) + -12.84927 
compute lcf32=(T13 * 0.1117800) + (T17 * 0.06471948) 

+ (T22 * 0.2171317) + -15.91867 
compute lcf41=(T13 * -0.008489698) + (T17 * -0.5090838) 

+ (T22 * -0.09004268) + (T16 * 1.974350) + -97.20442 
compute lcf42=(T13 * -0.007301202) + (T17 * -0.3875236) 

+ (T22 * -0.04205625) + (T16 * 1.999159) + -102.4071 

compute LCFCL3=8 
if (lcf31 gt lcf32)LCFCL3=7 
compute LCFCL4=8 
if (lcf41 gt lcf42)LCFCL4=7 
print formats LCFCL3 LCFCL4 (FI) 
variable labels 

lcf31 'Linear Class Function (LCF) score #1 3 preds' 

lcf32 'Linear Class Function (LCF) score #2 3 preds' 

lcf41 'Linear Class Function (LCF) score #1 4 preds' 

lcf42 'Linear Class Function (LCF) score #2 4 preds' 

LCFCL3 'LCF classification 3 preds' 

LCFCL4 'LCF classification 4 preds' 

LDFCL3 'LDF classification 3 preds' 

LDFCL4 'LDF classification 4 preds' 

list variables=id grade LDFCL3 LDFCL4 LCFCL3 LCFCL4/ 
cases=9999/ f ormat=numbered 
croBstabs grade by LDFCL3 
crosstabs grade by LDFCL4 
crosstabs grade by LCFCL3 
crosstabs grade by LCFCL4 
subtitle '1 LDFCL3 and LCFCL3 <>' 
temporary 

select if (LDFCL3 ne LCFCL3 ) 

list variables=id grade LDFCL3 LDFCL4 LCFCL3 LCFCL4/cases=99 

subtitle '2 LDFCL4 and LCFCL4 <>' 

temporary 

select if (LDFCL4 ne LCFCL4) 

list variables=id grade LDFCL3 LDFCL4 LCFCL3 LCFCL4/cases=99 

subtitle '3 LDFCL4 and LDFCL3 <>' 

temporary 

select if (LDFCL4 ne LDFCL3) 

list variables=::id grade LDFCL3 LDFCL4 LCFCL3 LCFCL4/cases=99 

subtitle '4 LCFCL4 and LCFCL3 <>' 

temporary 

select if (LCFCL4 ne LCFCL3) 

list variables==id grade LDFCL3 LDFCL4 LCFCL3 LCFCL4/cases=99 
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APPENDIX C 

SPSS Program for the Table 9 Heuristic Data Illustrating 
that Stepwise Methods Do Not Identify the Best Variable Set 

•^£•^3.© ^ AERA9 8 02 • SPS ★★***********************************', 
data list file=abc records=l table/ 

ID grp xl to x4 (2F4,4F9.3) 
list variables=all/cases=99 . 

subtitle '1 Stepwise DDA ###############################'. 
discriminant 

groups=grp(l, 3) /variables=xl to x4/analysis=xl to x4/ 
method=wi Iks /maxsteps=2 / 
statistics=mean stddev gcov cov boxm/ . 
subtitle '2a Enter XI, X2 §§@@@§@§§§§§§§§@§§§§§§§§§§§§§§0'. 
discriminant 

groups=grp(l, 3) /variables=xl to x4/analysis=xl x2/ 
method=direct/ . 

subtitle '2b XI, X2 Show 1-way MANOVA is DDA !!!!!!!!!!!!'. 
manova xl x2 by grp(l,3) /print=signif (multiv eigen dimenr) / 
discrim (stan corr alpha (. 999) ) /design . 
subtitle '3a Enter X1,X3 §0§©0§§§§§§§§§§§§§§§§§§§§§§§§§§ ' . 

discriminant 

groups=grp(l, 3) /variables=xl to x4/analysis=xl x3/ 
method=direct/ . 

subtitle '3b X1,X3 Show 1-way MANOVA is DDA !!!!!!!!!!!!'. 
manova xl x3 by grp(l, 3) /print=signif (multiv eigen dimenr)/ 
discrim (stan corr alpha (. 999) ) /design . 
subtitle '4a Enter X1,X4 0000000000000000000000000000000'. 

discriminant 

groups=grp(l, 3) /variables=xl to x4/analysis=xl x4/ 
method=direct/ . 

subtitle '4b X1,X4 Show 1-way MANOVA is DDA !!!!!!!!!!!!'. 
manova xl x4 by grp (1, 3 ) /print=signif (multiv eigen dimenr)/ 
discrim (stan corr alpha (. 999) ) /design . 
subtitle '5a Enter X2,X3 0000000000000000000000000000000'. 

discriminant 

groups=grp(l, 3) /variables=xl to x4/analysis=x2 x3/ 
method=direct/ . 

subtitle '5b X2 , X3 Show 1-way MANOVA is DDA !!!!!!!!!!!!'. 
manova x2 x3 by grp (1, 3) /print=signif (multiv eigen dimenr)/ 
discrim(stan corr alpha (. 999) ) /design . 
subtitle '6a Enter X2,X4 0000000000000000000000000000000'. 

discriminant 

groups=grp ( 1 , 3 ) /variables=xl to x4/analysis=x2 x4/ 
method=direct/ . 

subtitle '6b X2 , X4 Show 1-way MANOVA is DDA !!!!!!!!!!!!'. 
manova x2 x4 by grp (1, 3) /print=signif (multiv eigen dimenr)/ 
discrim (stan corr alpha (. 999) ) /design . 
subtitle '7a Enter X3,X4 0000000000000000000000000000000'. 

discriminant 

groups=grp(l, 3) /variables=xl to x4/analysis=x3 x4/ 
method=direct/ . 

subtitle '7b X3 , X4 Show 1-way MANOVA is DDA !!!!!!!!!!!!'. 
manova x3 x4 by grp (1, 3 ) /print=signif (multiv eigen dimenr)/ 
discrim (stan corr alpha (. 999) ) /design . 
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APPENDIX D 

SPSS for Windows Program 
for the Table 12 Heuristic Data Illustrating 
the Context Specificity of GLM Weights 



set printback=listing blanks=sysmis undef ined=warn . 

COMMENT 'AERA9801.SPS' . 

title 'Illustrate **Context Specificity** of GLM Weights' . 
data list 

f ile='c: \123\temp.prn' fixed records=l table 
/I id 1-3 grp 8 xl 14-15 x2 21-22 x3 28-29 x4 35-36 . 
list variables=all/cases=9999/ . 

subtitle '1 Discrim ***Smaller Variable Set***' . 
discriminant groups=grp(l, 3) /variables=xl to x3/ 
analysis=xl to x3/ 

method=direct/priors=equal/save scores (dscr) / 
plot=cases/classify=pooled/ 

statistics=mean stddev gcov cov corr boxm coef table, 
variable label 

dscrl 'Discriminant score Func I 3 predictors' 
dscr2 'Discriminant score Func II 3 predictors' . 
execute . 

subtitle '2 Discrim ###Larger Variable Set###' . 
discriminant groups=grp(l, 3) /variables=xl to x4/ 
analysis=xl to x4/ 

method=direct/priors=equal/save scores (dscore) / 
plot=cases/classify=pooled/ 

statistics=mean stddev gcov cov corr boxm coef table, 
variable label 

dscorel 'Discriminant score Func I 4 predictors' 
dscore2 'Discriminant score Func II 4 predictors' . 
execute . 



ERIC 



4 
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APPENDIX E 

SPSS Program for the Heuristic Data (Tables 14, 16, and 18) 
Illustrating the Importance of 
Both Function and Structure Coefficients 



title ' AERA9803 . SPS *************************************'. 
data list file=abc records=l table/ 

ID 2-3 grp 8 xl 14-15 X2 21-22 X3 28-29 
list variables=all/cases=99 . 

subtitle '1 Uncorrelated Response Variables #############'. 
discriminant 

groups=grp(l, 3) /variables=xl to x3/analysis=xl to x3/ 
method=direct/ 

statistics=mean stddev gcov cov boxm/ . 
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